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INTRODUCTION 


OVERVIEW  OF  STATLIB 

STATLIB  is  a  direct  access  file  on  the  CDC  General  Purpose  Computers  at  NSWC 
under  UN=LIBRARY.  It  contains  over  50  programs  and  subroutines  for  statistical  data 
analysis  and  random  number  generation.  The  programs  are  initiated  interactively  via  a 
menu  driver  (STATMNU)  which  queries  the  user  for  the  name  of  the  program  he  desires 
to  run.  User  name,  charge  code,  input  file  and  printing  instructions  are  requested  and  then 
a  batch  job  is  submitted  for  him  accordingly.  The  results  are  returned  to  the  user’s  directory 
in  a  file  named  STATOUT  (or  a  user  specified  filename)  which  is  created  for  him. 

The  subroutines  are  initiated  within  the  user’s  main  program.  Their  call  lines  and 
arguments  are  described  in  this  document,  but  since  they  are  not  menu  driven,  they  do  not 
appear  in  the  menu  of  programs. 

/  p  V 

y  '  L 

ORIGIN  OF  STATLIB 

Statistical  analysis  has  been  used  at  NSWC  since  the  earliest  days  of  ordnance  testing. 
As  a  result  of  increased  requirements  for  and  increased  complexity  of  statistical  analysis, 
the  Mathematical  Statistics  Branch  (currently  the  Mathematical  Statistics  Staff,  E406)  was 
formed  in  K  Department  in  1963.  Much  of  the  computation  and  analysis  at  that  time  was 
conducted  on  mechanical  desk  top  machines  and  on  programs  written  on  an  as  needed 
basis.  Reliable  commercial  software  as  we  know  it  today  simply  did  not  exist.  Hence, 
statistical  software  became  a  by-product  of  the  branch  almost  from  its  very  beginning.  At 
the  onset,  very  little  actual  programming  was  done  within  the  branch.  The  usual  approach 
was  to  formulate  the  requirements  and  obtain  programming  support  from  the  Computer 
Programming  Division.  This  process  continued  well  into  the  70’s  when  programming  was 
conducted  within  the  branch.  The  upshot  of  all  this  is  that  many  different  programmers 
were  involved  in  the  emerging  software.  With  the  exception  of  those  programs  which  are 
documented  in  NSWC  Reports,  the  original  programmers  are  usually  anonymous.  As 
commercial  statistical  software  became  more  available,  more  reliable,  and  more  friendly, 
the  need  for  new  in-house  statistical  programs  was  considerably  reduced.  Currently,  only 
a  small  effort  within  the  Mathematical  Statistics  Staff  is  devoted  to  software  development. 


ESTABLISHMENT  OF  STATLIB 

During  the  phase-out  period  of  the  CDC  6700  in  late  1984,  it  was  evident  that  the 
myriad  of  statistical  programs  developed  over  the  years  could  not  and  should  not  be  con- 
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verted  to  the  new  general  purpose  computer  (the  current  CDC  995).  Some  were  designed 
to  compute  descriptive  statistics  and  had  become  obsolete  with  respect  to  new  commercial 
software.  Some  were  simply  large  and  lethargic  by  today’s  standards  and  not  worthy  of 
the  effort  involved  in  conversion  and  checkout.  This  left  a  collection  of  about  30  programs 
deemed  worthy  of  retention.  In  addition,  it  left  a  collection  of  programs  that  would  offer 
better  utilization  in  the  form  of  subroutines.  With  the  obsolescence  of  card  files,  a  computer 
library  was  considered  the  most  efficient  means  of  storage  with  rapid  access.  While 
Mathematical  Statistics  Staff  members  would  probably  be  the  prime  users  of  such  a  library, 
its  establishment  would  make  the  software  accessible  to  all  NSWC  personnel.  With  respect 
to  the  programs,  this  establishment  involved  reviewing  each  program,  updating  where  ap¬ 
propriate,  designing  a  comprehensive  test  case,  and  correcting  any  discovered  errors.  With 
respect  to  the  subroutines,  it  involved  a  reconfiguration  of  old  programs  into  subroutines 
and  programming  of  newly  formulated  subroutines.  (Some  of  the  reconfigured  programs 
were  random  number  generators,  and  new  ones  were  written  to  form  a  complete  set  of  such 
generators  for  the  common  probability  distributions.)  Both  the  new  and  the  old  were  ex¬ 
tensively  tested  for  correaness.  While  error  free  software  is  a  rare  commodity,  it  is 
believed  that  the  programs  and  subroutines  which  form  STATLIB  contain  very  few  errors. 
This  is  because  of  the  long  history  and  recent  testing  of  the  old  software  and  the  extreme 
care  taken  in  developing  the  new. 

The  user  should  bear  in  mind  that  this  library  is  not  a  complete  library  in  the  sense  of 
a  commercial  package.  For  example,  it  will  not  do  basic  descriptive  statistics  nor  can  it  be 
used  to  categorize  or  sort  data.  STATLIB  should  be  regarded  as  a  specialized  set  of  pro¬ 
grams  and  subroutines  which  form  an  excellent  complement  to  the  commercial  statistical 
packages  available  at  NSWC.  Programs  in  STATLIB  either  do  what  the  commercial 
packages  do  not,  do  it  better,  or  simply  allow  the  user  to  have  more  control  over  what  is 
done. 


COMMERCIAL  STATISTICAL  PACKAGES  AT  NSWC 

The  commercial  statistical  package  which  is  currently  available  on  the  primary  main¬ 
frames  at  NSWC  is  SPSSX.  It  is  an  enhanced  and  extended  version  of  its  predecessor, 
SPSS,  which  was  available  at  the  Center  on  the  CDC  6700  from  1982  through  1985.  This 
package  is  a  comprehensive  tool  for  managing,  analyzing,  and  displaying  information.  It 
can  take  data  from  almost  any  kind  of  file  and  extract  meaningful  information  using  a  wide 
variety  of  statistical  procedures.  The  authors  of  this  report  have  found  it  to  be  an  excellent 
data  management  tool.  In  addition,  it  contains  some  very  creditable  statistical  programs, 
e.g.,  descriptive  statistics,  multiple  linear  regression,  and  one  way  analysis  of  variance  are 
quite  good.  On  the  other  hand,  their  treatment  of  nonlinear  regression  and  non-orthogonal 
analysis  of  variance  is  limited.  In  brief,  SPSSX  is  an  excellent  data  management  tool  with 
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provisions  for  basic  statistical  analysis,  but  for  many  specialized  procedures,  one  must  turn 
to  other  commercial  packages  and  STATLIB.  Unfortunately,  other  statistical  packages  are 
not  compatible  with  the  CDC  hardware  at  NSWC  for  reasons  to  be  discussed  below. 

There  are  three  major  comprehensive  mainframe  statistical  packages,  and  they  are  all 
geared  for  IBM  and  other  non-CDC  mainframes.  In  addition  to  SPSSX,  these  packages  are 
SAS  and  BMDP.  SAS  is  an  excellent  package  for  statistical  analysis  which  is  available  at 
most  universities.  However,  it  is  written  in  machine  language  for  non-CDC  mainframes 
and,  hence,  not  available  to  NSWC.  BMDP  leases  a  CDC  version  of  their  package,  but  it 
has  been  found  to  be  extremely  time  consuming  to  install  on  NSWC  hardware.  Therefore, 
of  the  major  commercial  statistical  packages  available  for  mainframes,  NSWC  is  essentially 
limited  to  SPSSX.  Even  with  SPSSX,  conversions  of  their  enhancements  for  CDC  hard¬ 
ware  are  last  on  the  list.  For  example,  NSWC  waited  over  a  year  longer  for  the  release  of 
SPSSX  than  did  the  users  with  IBM  mainframes. 

Versions  of  all  three  of  the  above  named  packages  have  recently  become  available  for 
desk  top  personal  computers.  They  are  referred  to  as  SPSSPC+,  BMDPC,  and  SASPC  and 
in  general  are  less  comprehensive  than  their  mainframe  counterparts.  The  basic  version  of 
SPSSPC+  and  most  of  the  more  useful  BMDPC  programs  have  been  purchased  by  the 
Mathematical  Statistics  Staff.  Of  course,  these  are  copy  protected  and  subject  to  copywrite 
laws  which  means  that  they  are  available  at  only  a  single  work  station. 

In  addition  to  the  three  major  statistical  packages  cited  for  mainframe  and  PC  use,  there 
are  scores  of  less  comprehensive  and  less  widely  known  statistical  software  currently 
available  and  scores  becoming  available  for  personal  computers.  Some  of  these  are  very 
good  and  inexpensive  and  have  been  purchased  by  the  Mathematical  Statistics  Staff. 
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USING  STATLIB 


LIBRARY  ORGANIZATION 

STATLIB  is  comprised  of  a  collection  of  (1)  34  programs  for  statistical  data  analysis 
and  probability  calculations  and  (2)  a  set  of  24  subroutines  for  random  number  generation. 
The  programs  are  subdivided  into  seven  categories;  namely,  regression  analysis,  goodness 
of  fit  analysis,  discrete  power  evaluation,  continuous  power  evaluation,  probability  evalu¬ 
ation,  confidence  limit  evaluation,  and  miscellaneous  analysis.  The  subroutines  are 
grouped  into  two  categories  —  discrete  random  number  generators  and  continuous  random 
number  generators. 


HOW  TO  CALL  IT 

In  order  to  access  STATLIB  the  user  must  first  attach  and  library  STATLIB  as  fol¬ 
lows: 

ATTACH,STATLIB/UN=LIBRARY. 

LIBRARY.STATLIB. 

If  the  user  wishes  to  execute  a  STATLIB  program,  no  other  system  libraries  need  be  at¬ 
tached.  If  the  user  wants  access  to  the  STATLIB  subroutines,  then  the  system  library 
MATHLIB  must  also  be  attached  and  libraried  under  UN=LIBRARY. 


INFORMATION  NEEDED  TO  RUN  IT 

The  programs  in  STATLIB  are  initiated  interactively  via  a  menu  driver.  After  attach¬ 
ing  STATLIB  the  command 

STATMNU 

will  display  a  menu  consisting  of  the  seven  program  categories.  General  details  on  how  to 
proceed  at  this  stage  are  available  to  the  user  through  a  HELP  function  key.  If  no  help 
instructions  are  required  interactively,  the  user  will  select  the  category  of  programs  he 
wishes  to  access.  A  second  menu  will  then  appear  listing  all  programs  under  that  catego¬ 
ry.  A  general  description  of  any  particular  program  in  this  category  can  be  obtained  at  this 
juncture  by  using  the  HELP  function  key  followed  by  the  number  of  the  program  for  which 
a  description  is  desired.  If  no  basic  program  descriptions  are  needed  interactively,  the  user 
selects  the  number  of  the  specific  program  he  wishes  to  execute. 

If  the  program  selected  is  the  first  program  to  be  executed  in  the  current  interactive 
session,  the  user  will  see  the  following  appear  on  the  screen: 
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MAKJCL 


USER  NAME: 
CHARGE  CODE: 


Specify  values  and  press  NEXT  when  ready 


Once  this  information  has  been  provided,  a  second  screen  will  appear: 


STATL1B  PROGRAM  NAME 


NAME  OF  DATA  FILE 
ROUTE  OUTPUT  TO  PRINTER? (DEFAULT=NO) 
OUTPUT  FILENAME  ( DEF AULT= S T ATOUT ) 


Specify  values  and  press  NEXT  when  ready 


The  data  file  requested  is  the  name  of  the  input  file  which  contains  the  user’s  data  and  input 
instructions  in  accordance  with  the  selected  program’s  input  guide.  On  the  CDC  995  any 
name  consisting  of  from  1  to  7  alphanumeric  characters  which  begins  with  a  letter  is  per¬ 
missible.  The  printing  option  gives  the  user  the  flexibility  to  route  the  output  immediately 
or  to  defer  printing  until  the  output  can  be  viewed  interactively  (i.e.,  input  errors  may  be 
present  or  the  output  may  not  be  as  desired).  The  user  is  also  given  the  option  of  specifying 
the  output  filename.  If  no  filename  is  specified,  the  name  STATOUT  will  be  used.  Once 
this  program  information  has  been  furnished  a  batch  job  is  submitted  and  the  results  re¬ 
turned  to  the  user’s  permanent  directory  under  the  chosen  filename.  If  additional  STATLIB 
programs  are  executed  within  the  same  interactive  session,  only  the  second  screen  above 
will  appear. 

Some  STATLIB  programs  have  the  capability  to  write  useful  information  onto  a  data 
file  during  execution.  For  example,  the  STATLIB  regression  program  GEMREG  gives  the 
user  the  option  of  creating  a  data  file  of  model  residuals  from  the  regression  fit.  For  these 
STATLIB  programs  the  second  screen  will  require  an  additional  user  input.  For  example, 
this  screen  will  appear  as  follows  for  program  GEMREG: 
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> 


NAME  OF  DATA  FILE: 
ROUTE  OUTPUT  TO  PRINTER? (DEFAULT=NO) : 
OUTPUT  FILENAME  (DEFAULT=STATOUT) : 
OUTPUT  FILENAME  FOR  RESIDUALS  (OPTIONAL) : 


Specify  values  and  press  NEXT  when  ready 


This  last  filename  request  gives  the  user  the  flexibility  to  store  the  model  residuals  from  the 
regression  fit  in  the  filename  of  his  choice.  The  input  guide  for  GEMREG  specifies  the 
contents  and  the  format  of  this  file  of  residuals. 

A  caution  regarding  the  output  filename  is  in  order.  If  the  user  does  not  specify  an 
output  filename  the  default  name  STATOUT  will  be  used.  This  file  will  reside  in  the  user’s 
permanent  directory.  Unless  it  is  purged,  it  will  remain  there  and  be  replaced  by  a  new 
version  of  STATOUT  each  time  a  STATLIB  program  is  executed.  It  is  not  recommended 
that  the  user  purge  STATOUT  each  time  a  STATLIB  program  is  run.  However,  care 
should  be  exercised  when  running  the  same  STATLIB  program  multiple  times  to  ensure 
that  the  version  of  STATOUT  being  viewed  is  the  version  which  corresponds  to  the  user’s 
current  run.  If,  for  example,  the  user  submits  a  run  whose  input  file  contains  errors  without 
realizing  it,  then,  depending  on  the  severity  of  the  error,  he  may  be  viewing  the  most  recent 
copy  of  STATOUT  placed  in  his  permanent  directory  when  he  last  ran  that  program  suc¬ 
cessfully. 

STATLTB  does  not  provide  a  menu  listing  of  its  subroutines.  These  subroutines  must 
be  called  from  the  user’s  main  program  after  attaching  both  STATLIB  and  MATHLIB. 
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EXAMPLES 


PROGRAM 

An  example  of  the  execution  of  one  of  the  programs  in  STATLIB  is  provided  here  as 
an  aid  to  the  user.  The  program  chosen  to  illustrate  STATLIB  usage  is  GEMREG,  the  first 
program  in  the  STATLIB  category  of  regression. 

The  first  step  in  executing  any  program  in  STATLIB  is  the  preparation  of  an  appro¬ 
priate  data  file.  The  data  file  contains  the  user’s  data  and  input  instructions  in  accordance 
with  the  selected  program’s  input  guide.  Program  input  guides  and  other  related  informa¬ 
tion  are  given  in  Section  IV,  DESCRIPTIONS  AND  INPUT  GUIDES.  Prior  to  execution 
of  the  program  the  data  file  must  be  stored  in  the  user’s  permanent  directory. 

Next  the  user  must  follow  the  instructions  given  in  Section  II,  USING  STATLIB, 
which  require  applying  the  ATTACH  and  LIBRARY  commands  for  STATLIB.  Then  after 
typing  STATMNU  the  user  is  requested  to  choose  a  program  and  provide  answers  to  sev¬ 
eral  questions.  One  of  these  questions  requests  the  name  of  the  user’s  data  file.  After  the 
data  file  name  (and  other  required  information)  is  given,  the  program  is  executed.  The 
output  is  returned  to  the  user’s  permanent  directory  under  the  file  name  STATOUT  (de¬ 
fault)  or  a  user  specified  file  name.  The  output  may  also  be  routed  directly  to  the  printer. 

Data  for  the  GEMREG  example  was  taken  from  Example  10.4  on  page  362  in  Prob¬ 
ability  and  Statistics  for  Engineers  and  Scientists  by  R.  E.  Walpole  and  R.  H.  Myers,  Third 
Edition,  Macmillan  Publishing  Company.  The  data  consists  of  18  observations  of  two 
regressor  variables,  X,  and  X2 ,  and  a  response  variable,  y.  The  particular  values  of  X;  and 
X2  represent  nine  unique  design  points  and  each  design  point  was  replicated  twice  yielding 
18  data  points.  A  data  file  containing  the  18  data  points  and  input  instructions  according 
to  the  GEMREG  input  guide  was  prepared  and  stored  as  a  permanent  file  under  file  name 
DATGEMX. 

The  contents  of  DATGEMX  are  given  on  page  11.  The  reader  may  wish  to  refer  to 
the  GEMREG  input  guide  as  he  views  that  page.  The  input  has  been  prepared  assuming  a 
second  order  regression  model.  Predicted  values  of  the  response  variable  were  requested. 
Also,  95%  confidence  limits  for  the  mean  response  and  a  single  future  response  were  re¬ 
quested  at  four  synthetic  points.  A  single  rerun  was  indicated  with  the  three  second  order 
terms  being  deleted  from  the  model. 

The  GEMREG  output  for  the  input  data  file  DATGEMX  is  given  on  pages  12  -  18. 
This  is  the  output  that  would  be  returned  to  STATOUT,  assuming  the  default  option  was 
chosen.  There  was  no  request  made  to  write  model  residuals  onto  a  second  file.  The  system 
day  file  for  the  GEMREG  run  is  on  pages  18  and  19.  This  file  is  available  to  the  user  under 
a  file  name  assigned  by  the  system. 
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SUBROUTINE 

Instructions  on  the  use  of  subroutines  in  STATLIB  are  also  given  in  Section  II,  USING 
STATLIB.  As  indicated  there  the  ATTACH  and  LIBRARY  commands  must  be  applied 
for  STATLIB  and  MATHLIB  for  access  to  STATLIB  subroutines. 

In  order  to  execute  a  subroutine  from  STATLIB  the  user  must  construct  a  main  or 
calling  program.  Argument  values  are  passed  between  the  main  program  and  subroutine 
through  the  subroutine  call  line.  Call  lines  and  other  descriptive  information  about  each 
subroutine  are  given  in  Section  IV,  DESCRIPTIONS  AND  INPUT  GUIDES.  The  user  is 
responsible  for  printing  all  argument  values  of  the  subroutine,  should  he  wish  to  see  them. 
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REGRESSION  ANALYSIS 

Regression  analysis  is  a  collection  of  statistical  techniques  for  investigating  and 
modeling  the  relationship  between  variables.  For  a  specific  problem,  a  regression  model 
containing  unknown  population  parameters  is  postulated.  The  unknown  parameters  are 
estimated  from  sample  data,  and  statistical  tests  are  performed  to  ascertain  if  all  the  pa¬ 
rameters  in  the  postulated  model  are  required,  if  the  model  is  adequate,  or  if  a  relationship 
between  the  variables  even  exists.  If  the  postulated  model  is  linear  in  the  parameters,  it  is 
referred  to  as  a  linear  model.  Otherwise,  it  is  referred  to  as  a  nonlinear  model.  For  ex¬ 
ample, 

y=P o  +  ftX  +  ft^e 

is  linear  in  the  parameters  and,  hence,  is  a  linear  model.  On  the  other  hand, 

y  =  fVO  +  Pi^  2x)  +  e 

is  nonlinear  in  the  parameters  and,  hence,  is  a  nonlinear  model.  The  regression  programs 
in  STATLIB  are  linear  regression  programs.  Programs  for  nonlinear  regression  analysis 
have  been  purchased  commercially.  However,  it  should  be  noted  that  a  nonlinear  model  can 
oftentimes  be  transformed  to  a  linear  one,  and  linear  theory  can  then  be  applied  to  the 
transformed  model.  For  example, 

y  =  exp(p0  +  p,X+e) 

is  nonlinear  but  intrinsically  linear  since  the  transformation 

Iny  =  p0  +  P,X+e 

is  linear.  Hence,  linear  regression  analysis  could  be  applied  to  the  transformed  model. 

Linear  models  in  a  regression  framework  come  under  the  heading  of  the  general  linear 

model 

y  ~  Po  +  PA  +  •  •  •  +  +  e. 

In  this  model,  y  is  referred  to  as  the  dependent  (or  response)  variable,  the  X,  as  independent 
variables  (or  regressors),  the  p,  as  regression  coefficients,  and  e  as  the  random  error.  The 
inclusion  of  e  accounts  for  the  fact  that  the  relationship  between  the  response  variable  and 
the  regressors  is  not  an  exact  functional  relationship.  The  parameters  P,  are  estimated  by 

the  method  of  least  squares  on  the  basis  of  n>  k  response  values  y,  for  specified  values  of 
the  independent  variables.  It  is  usually  convenient  to  express  the  general  linear  model  in 
matrix  notation,  i.e., 

y  =  X(3  +  £ 
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where  y  is  the  n  x  1  vector  of  values  for  the  response  variable,  p  is  the  (k  +  1)  x  1  vector 
of  regression  coefficients,  e  is  the  n  x  1  vector  of  random  errors,  and  X  is  the  n  x  (k  +  1) 
matrix  of  values  for  the  regressors.  We  write 

1  Xu  X2l  ...  Xkl 

1  Xn  X22  ...  Xa 

X  = 

.1  Xla  X *  ...  -Xt, 

where  XtJ  is  the  value  associated  with  the  jth  response  for  the  ith  regressor.  In  this  notation, 
the  least  squares  estimates  for  the  p,  are 

p=  (X'Xf'X'y 

where  the  ‘hat’  or  circumflex  is  used  to  denote  the  estimate  of  a  parameter. 

The  problem  of  estimation  in  regression  is  usually  straightforward  provided  the  set  of 
influential  regressors  is  known.  In  most  practical  problems,  this  is  not  the  case.  Usually, 
one  has  a  pool  of  candidate  regressors,  and  the  first  problem  is  ascertaining  if  the  pool  is 
sufficient  to  build  an  adequate  model  (lack  of  fit  problem).  If  the  pool  is  sufficient,  one 
then  has  the  problem  of  finding  the  appropriate  subset  for  the  model  (variable  screening 
problem)  via  tests  of  hypotheses.  In  the  final  analysis,  we  have  the  problem  of  assessing 
the  worth  of  the  model  via  confidence  limits  on  the  regression  coefficients,  on  the  mean 
response,  and  on  the  predicted  response.  These  analyses  are  based  on  the  fact  that  the 
underlying  model  assumptions  have  been  met.  The  most  critical  of  these  assumptions  is 
that  the  probability  distribution  for  the  vector  of  errors  has  covariance  equal  to  0*1.  If  they 
are  not  met,  the  analysis  becomes  more  complicated,  and  the  results  more  difficult  to  in¬ 
terpret.  For  further  reading  on  the  broad  topic  of  regression,  the  user  is  referred  to  the 
classical  textbooks  on  the  subject,  References  1,  2,  3,  and  4. 

None  of  the  programs  in  STATLIB  are  capable  of  resolving  all  of  the  problems  in 
regression,  but  each  one  has  special  features  which  help  resolve  some  of  them.  Collec¬ 
tively,  they  can  help  the  user  solve  most  of  his  regression  problems.  Their  features  and 
limitations  are  discussed  in  the  ensuing  pages. 

REFERENCES 

1.  Draper,  N.  R.  and  Smith,  H.  (1981),  Applied  Regression  Analysis,  Second  Edition, 
John  Wiley  &  Sons,  Inc. 
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2.  Graybill,  Franklin  A.  (1961),  An  Introduction  to  Linear  Statistical  Models,  Volume 
I,  McGraw-Hill  Book  Company,  Inc. 

3.  Montgomery,  Douglas  C.  and  Peck,  Elizabeth  A.  (1982),  Introduction  to  Linear 
Regression  Analysis,  John  Wiley  &  Sons,  Inc. 

4.  Neter,  John,  Wasserman,  William,  and  Kutner,  Michael  H.  (1985),  Applied  Linear 
Statistical  Models,  Richard  D.  Irwin,  Inc. 
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GEMREG 


PURPOSE 

Program  GEMREG  (General  Multiple  Regression!  is  a  multiple  regression  program 
designed  in  the  late  70’ s  to  provide  more  flexibility  and  more  user  options  than  were 
available  at  the  time.  One  of  its  primary  features  is  its  “all  regressions”  capability  which 
provides  for  the  automatic  computation  of  regression  analyses  on  each  regressor  alone,  each 
pair,  each  triplet,  etc.  In  a  model  with  k  regressors,  this  procedure  performs  2*-l  regression 
analyses.  While  the  program  incorporates  an  “all  regressions”  testing  scheme  in  an  attempt 
to  determine  a  “best”  regression  equation,  a  recent  enhancement  of  the  procedure  provides 
a  means  of  determining  a  “best”  regression  in  the  sense  of  Montgomery  and  Peck  (Refer¬ 
ence  1,  Chapter  7)  and  Myers  (Reference  2,  Chapter  4). 


FEATURES 

GEMREG  features  fall  into  two  categories  -  those  standard  for  all  runs  and  those  op¬ 
tional  at  the  discretion  of  the  user.  The  standard  features  in  GEMREG  include  the 
following: 

*  A  correlation  matrix  showing  the  simple  correlation  between  all  pairs  of  variables. 

*  The  inverse  of  X'X  and  the  solution  vector  of  estimates  of  the  regression  coeffi¬ 
cients. 

*  The  determinant  of  X'X  and  a  check  on  its  inverse. 

*  An  ANOVA  (Analysis  of  Variance)  table  with  a  test  for  regression  significance 
and  a  test  for  lack  of  fit  (model  adequacy). 

Optional  features  include  the  following: 

*  A  table  of  the  predicted  values  and  residuals. 

*  Decomposition  of  pure  error  sum  of  squares. 

*  Automatic  generation  of  powers  and  cross  product  terms. 

*  Confidence  limits  on  the  expected  response  for  specific  values  of  the  regressors 
(referred  to  as  synthetic  points). 

*  Prediction  limits  on  the  response  for  specific  values  of  the  regressors  (synthetic 
points). 

*  “All  regressions”  testing  for  the  “best”  regression  equation. 
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*  Computation  and  tabling  of  SSE  (residual  sum  of  squares),  MSE  (residual  mean 
square),  R2  (coefficient  of  determination),  adjusted  R2,  and  Mallows’  Cp  statistic 
for  each  of  the  2*-l  regressions  in  the  “all  regressions”  procedure. 

*  Hand  selected  reruns  to  enable  the  user  to  test  the  significance  of  any  subset  of 
regressors. 

*  Creation  of  a  file  in  the  user’s  permanent  directory  containing  the  observed  re¬ 
sponse  y,  predicted  response  y,  and  residual  y  -y  for  all  data  points  in  format 
(E15.9,5X,E15.9,5X,E15.9). 

In  order  to  accommodate  the  “all  regressions”  feature,  the  number  of  regressors 
(original  +  generated)  is  limited  to  ten.  In  addition,  the  number  of  data  points  is  limited  to 
1000.  The  user  is  referred  to  Reference  3  for  a  detailed  discussion  of  the  program  and  to 
References  1  and  2  for  a  discussion  of  regression  criteria  and  the  use  of  “all  regressions” 
measures  in  determining  a  “best”  regression  equation. 


REFERENCES 

1.  Montgomery,  Douglas  C.  and  Peck,  Elizabeth  A.  (1982),  Introduction  to  Linear 
Regression  Analysis ,  John  Wiley  &  Sons,  Inc.,  pp.  244-286. 

2.  Myers,  Raymond  H.  (1986),  Classical  and  Modem  Regression  with  Applications , 
Duxbury  Press,  pp.  101-136. 

3.  Taub,  A.  E.  and  Thomas,  M.  A.  (1981),  GEMREG-A  General  Multiple  Regression 
Program ,  NSWC  TN  81-298,  NSWC,  Dahlgren,  VIRGINIA  22448. 


INPUT  GUIDE 

The  user-created  input  file  consists  of  two  records  containing  standard  feature  control 
variables,  one  or  more  records  containing  optional  feature  control  variables,  and  the  data 
records  containing  the  data  points  (response  values  and  associated  regressor  values).  The 
position  of  the  response  variable  and  the  input  format  are  controlled  by  the  user  on  the 
second  control  record.  Enhancements  have  been  made  to  the  original  program  since  the  date 
of  Reference  3.  Therefore,  in  case  of  discrepancies  between  the  input  guide  in  Reference 
3  and  the  one  below  the  current  guide  should  take  precedence. 

Record 

.Iypg „  Variable  _ Description _  Columns  Format 

1  ID  Problem  description.  1-72  9A8 
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NOBS 

Number  of  data  points. 

(NOBS  <  1000) 

1-5 

15 

NCOL 

Number  of  columns  in  the  data  matrix,  i.e., 
the  number  of  variables  read  in  (regressors  + 
response). 

(NCOL  <  11) 

6-10 

15 

NDEP 

Position  of  the  dependent  variable  (column 
number  in  data  matrix). 

11-15 

15 

NOP 

Number  of  option  records. 

16-20 

15 

IPPV 

=1,  print  predicted  values. 

=0,  suppress  predicted  values. 

21-25 

15 

IPLOF 

=1,  print  decomposition  of  pure  error 
sum  of  squares 

=0,  suppress  decomposition  printout. 

26-30 

15 

MS 

=R,  use  residual  mean  square  for  confi¬ 
dence  limits  and  testing. 

(DEFAULT=R) 

=P,  use  pure  error  mean  square  for  con¬ 
fidence  limits  and  testing. 

35 

A1 

FORM 

Variable  format  (placed  in  parentheses)  for 
reading  data  matrix  and  synthetic  points. 

41-80 

4A10 

IOP 

Option  desired  (right  justify) 

=P,  power  variables  will  be  generated. 

=CP,  cross  product  variables  will  be  gen¬ 
erated. 

=CL,  confidence  limits  will  be  computed. 
=AR,  “all  regressions”  will  be  generated. 

=R,  hand  selected  reruns  will  be  executed. 

1-2 

A2 
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IOPDIR(I) 


ALPHA 


If  IOP=P,  IOPDIR(l)  contains  the  index  of 

6-10 

1X.2I2 

the  variable  and  IOPDIR(2)  contains  the 

11-15 

IX, 212 

power  to  which  it  is  to  be  raised.  Repeat  as 

16-20 

IX, 212 

necessary  in  5  column  sets. 

etc. 

ditto 

If  IOP=CP,  IOPDIR(l)  and  IOPDIR(2) 

6-10 

IX, 212 

contain  the  indices  of  the  pairs  of  variables 

11-15 

1X.2I2 

to  be  crossed.  (More  than  two  variables  may 

16-20 

IX, 212 

be  crossed  by  using  indices  of  generated 
variables.)  Repeat  as  necessary. 

etc. 

ditto 

If  I0P=CL,  IOPDIR(l)  contains  the  number 
of  synthetic  points  to  be  read  in  and  IOP- 
DIR(2)  contains  the  number  of  future 
observations  in  the  average  for  which  pre¬ 
diction  limits  are  desired. 

6-15 

2(1X,I4) 

If  IOP=AR,  IOPDIR(l)=l  if  the  printout  for 
each  regression  is  desired.  If  left  blank,  new 
“all  regressions”  features  will  not  be  printed. 

10 

11 

If  IOP=R,  IOPDIR(I)  contains  the  index  of 

6-10 

1X,I4 

the  ith  variable  to  be  deleted  for  the  rerun. 

11-15 

IX, 14 

Repeat  as  necessary. 

16-20 

IX, 14 

etc. 

ditto 

If  IOP=CL,  1 -ALPHA  is  the  confidence 
level  to  t)e  used.  If  IOP=AR,  ALPHA  is  the 

61-70 

FI  0.0 

significance  level  to  be  used  for  “all 
regressions”  testing. 


If  the  CL  option  has  been  specified,  record  4  contains  the  synthetic  points  (one  point 
per  record)  according  to  the  format  specified  by  FORM  on  record  2.  This  record  type  is 
repeated  as  necessary  and  must  immediately  follow  record  3  on  which  the  CL  option  was 
specified.  Omit  this  record  type  if  the  CL  option  has  not  been  specified. 

Record  type  5  contains  the  input  data  points  according  to  the  format  specified  by 
FORM  on  record  type  2. 

The  request  for  the  creation  of  a  file  in  the  user’s  permanent  directory  containing  y,  y,  and 
y  -  y  is  not  made  on  any  of  the  input  records.  This  request  is  made  by  supplying  the 
permanent  file  name  for  this  file  when  prompted  during  GEMREG  setup.  The  prompt  will 

read  OUTPUT  FILENAME  FOR  RESIDUALS  (OPTIONAL): _  .  If  a  name  is 

supplied,  the  file  will  be  created  under  that  filename. 
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COMMENTS 

If  one  has  control  over  the  input  data,  it  is  convenient  to  place  the  response  variable 
last.  This  eliminates  problems  in  exercising  the  P,  CP,  and  CL  options.  These  options  can 
be  exercised  if  the  response  is  not  last,  but  greater  care  must  be  taken  in  specifying  the 
indices  and  in  formatting  the  synthetic  points. 

User  selected  options  (record  type  3)  should  be  input  in  the  same  order  as  they  are 
listed  in  the  input  guide.  The  program  will  run  regardless  of  order.  However,  if  the  order 
shown  is  not  adhered  to,  there  is  no  guarantee  that  the  input  options  will  be  incorporated. 
For  example,  options  P  and  CP  generate  power  and  cross  product  terms,  respectively.  If 
these  options  are  desired  and  not  input  first  as  listed,  then  power  and  cross  product  terms 
may  not  be  generated  for  prior  listed  options.  On  the  other  hand,  if  they  are  input  first, 
power  and  cross  product  terms  will  be  generated  initially  and  remain  in  the  model  through 
the  exercise  of  all  subsequent  options  unless  removed  by  the  R  (rerun)  option. 
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DAMRCA 

PURPOSE 

Program  DAMRCA  (Dahlgren  Multiple  Regression  Comprehensive  Analysis)  per¬ 
forms  a  regression  analysis  for  a  general  linear  model  containing  up  to  50  regressor 
variables.  DAMRCA  was  written  in  the  early  sixties  and  documented  in  1966  in  Reference 
1.  The  program  contains  procedures  which  allow  the  analyst  to  evaluate  the  statistical 
significance  of  a  postulated  model.  A  procedure  is  also  available  for  determining  if  spe¬ 
cified  regressors  are  required  in  the  model.  Two  ranking  procedures  are  available  which 
provide  an  ordering  of  the  regressor  variables  with  respect  to  conditional  prediction  power 
for  the  response  variable.  By  utilizing  the  procedures  within  DAMRCA,  the  analyst  can 
determine  a  statistically  significant  model  for  the  response  variable  and  a  relative  ranking 
of  the  regressors  for  that  model. 

FEATURES 

DAMRCA  features  fall  into  two  categories  -  those  which  are  standard  for  all  runs  and 
those  which  are  optional  at  the  discretion  of  the  user.  Standard  features  include  the  fol¬ 
lowing: 

*  The  inverse  and  determinant  of  X'X  and  an  accuracy  check  on  the  inverse. 

*  Estimates  of  the  regression  coefficients  and  their  standard  deviations. 

*  An  analysis  of  variance  table  with  a  test  for  statistical  significance  of  the  regres¬ 
sion  model. 

*  Multiple  correlation  coefficient. 

*  Predictions  and  prediction  errors  (postulated  model). 

*  Histogram  and  chi-square  goodness  of  fit  test  for  normality  of  prediction  errors 
(postulated  model). 

Optional  features  include  the  following: 

*  Automatic  generation  of  powers  and  cross  products  of  regressors. 

*  User  selected  reruns  which  allow  significance  testing  of  any  subset  of  regressors. 

*  Predictions  and  prediction  errors  for  reruns. 

*  Histogram  and  chi-square  goodness  of  fit  test  for  normality  of  prediction  errors  for 
reruns. 
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*  Predictions  and  prediction  standard  deviations  for  a  future  response  and  for  the 
expected  response  at  specified  values  of  the  regressors  (for  construction  of  con¬ 
fidence  limits). 

*  Forward  selection  procedure  for  regressors  (IVOR). 

*  Backward  elimination  procedure  for  regressors  (BIVOR). 

For  a  detailed  discussion  of  the  program,  see  Reference  1.  The  forward  selection  and 
backward  elimination  procedures  along  with  many  other  topics  related  to  regression  anal¬ 
ysis  are  discussed  in  Reference  2. 


REFERENCES 

1.  Abt,  K.,  Gemmill,  G.,  Herring,  T.  and  Shade,  R.  (1966),  DA-MRCA:  A  Fortran  IV 
Program  for  Multiple  Linear  Regression,  NWL  Report  No.  2035,  NSWC,  Dahlgren, 
VIRGINIA  22448. 

2.  Montgomery,  D.  C.  and  Peck,  E.  A.  (1982),  Introduction  to  Linear  Regression 
Analysis ,  John  Wiley  &  Sons,  Inc. 


INPUT  GUIDE 

A  concise  input  guide  for  the  input  file  is  given  below.  A  more  detailed  version  is 
provided  in  Reference  1.  Some  changes  have  been  made  to  the  original  program  since  the 
printing  of  Reference  1.  In  the  case  of  discrepancies  between  the  two  versions,  the  current 
input  guide  given  below  should  be  followed.  Record  types  1, 2,  8  and  9  are  mandatory  and 
comprise  the  most  basic  run  that  a  user  could  submit.  The  other  records  listed  below  afford 
the  user  with  optional  output  as  indicated. 


Record 

Tvpe 

Variable 

Description 

Columns 

Format 

1 

PGLB 

Problem  description. 

1-72 

9A8 

2 

IR 

Number  of  original  regressors. 

(IR  <  50) 

1-2 

12 

IS 

Number  of  generated  regressors  (powers  and 
cross  products  of  input  regressors). 

(IR  +  IS  <  50) 

3-4 

12 
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NR 

Number  of  reruns. 

5-7 

13 

MVP 

Number  of  synthetic  points  (points  which 
are  not  contained  in  the  input  data)  for  which 
predictions  and  prediction  standard  devi¬ 
ations  will  be  computed. 

8-10 

13 

NDR 

Number  of  input  points  for  which  predic¬ 
tions  and  prediction  standard  deviations  will 
be  computed. 

11-13 

13 

NPE 

=0, 

predictions,  prediction  errors  and  nor¬ 
mality  test  of  prediction  errors  will  be 
calculated  for  the  postulated  model 
only. 

15 

11 

=1, 

calculations  designated  above  will  be 
done  for  reruns  and  IVOR/BIVOR 

runs. 

NDPO 

=0, 

data  coordinates  will  be  printed  in 
9F13.6.  Prediction  and  prediction  er¬ 
rors  will  be  printed  in  2F15.6. 

16 

11 

=L 

data  coordinates  will  be  printed  in 
7E17.8.  Prediction  and  prediction  er¬ 
rors  will  be  printed  in  2F15.6. 

=2, 

data  coordinates  will  not  be  printed 
but  prediction  and  prediction  errors 
will  be  printed  in  2F15.6. 

IVORGO 

=0, 

IVOR  and  BIVOR  will  not  be  used. 

18 

11 

=1, 

IVOR  will  be  used. 

=2, 

BIVOR  will  be  used. 

=3, 

both  IVOR  and  BIVOR  will  be  used. 

NFD 

Maximum  number  of  variables  to  be  read 
from  each  input  record  for  data  point  coor¬ 
dinates. 

19-20 

12 

IBID 

=0, 

identity  matrix  will  be  computed  and 

21 

11 

checked  for  accuracy  for  all  BIVOR 
runs. 

=  1,  identity  matrix  and  accuracy  compu¬ 
tations  will  terminate  in  BIVOR  after 
accuracy  criterion  has  been  met. 

TOLI1  Accuracy  criterion  for  identity  matrix  print-  23-31  E9.5 

out.  0.0001  is  suggested  value. 
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T0LI2  Accuracy  criterion  for  identity  matrix  ac-  32-40  E9.5 

ceptance.  0.001  is  suggested  value. 

FORM  Format  for  reading  coordinates  of  data  41-80  5A8 

points.  No  parentheses  are  needed  and 
specification  should  ignore  columns  1-2  of 
the  input  record.  The  format  applies  to  a 
single  record  only.  If  more  than  one  record 
is  required  for  a  data  point,  subsequent  re¬ 
cords  should  use  the  same  format. 


Record  type  3  is  used  for  generation  of  regressors  from  powers  and  cross  products  of  the 
original  (input)  regressors.  If  no  regressors  are  to  be  generated  this  record  is  omitted.  A 
maximum  of  10  regressors  can  be  used  to  form  a  product  term. 


IN(1,1) 

Subscript  of  the  regressor  to  be  used  as  the 
first  factor  in  the  first  product  term. 

1-2 

12 

IN(1,2) 

Subscript  of  the  regressor  to  be  used  as  the 
second  factor  in  the  first  product  term. 

3-4 

12 

IN(U0) 

Subscript  of  the  regressor  to  be  used  as  the 
tenth  factor  in  the  first  product  term. 

19-20 

12 

The  descriptions  of  the  second,  third  and  fourth  product  terms  are  entered  in 
columns  21-40, 41-60  and  61-80,  respectively,  in  the  same  format.  If  more  than 
four  terms  are  required,  additional  records  are  added  as  needed. 

Record  type  4  is  used  to  designate  IVOR  instructions.  If  IVOR  is  not  used,  this  record  must 
be  omitted. 


4 


IQ  Number  of  regressors  to  be  ranked  by  IVOR.  1-2 

MI  Number  of  groups  into  which  the  regressors  3-5 

are  to  be  divided  for  ordering  within  groups. 

The  subdivision  into  groups  is  based  strictly 
on  the  order  of  input  and  generation  of  re¬ 
gressors. 

(MI  <  25) 

NJ(1)  Number  of  regressors  in  the  first  group.  6-8 

NJ(2)  Number  of  regressors  in  the  second  group.  9-11 


12 

13 


13 

13 
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NJ(25) 

Number  of  regressors  in  the  twenty-fifth 
group. 

78-80 

13 

Record  type  5  is  used  to  designate  BIVOR  instructions.  If  BIVOR  is  not  used,  this  record 
must  be  omitted. 

5  MB 

The  number  of  groups  into  which  the  re¬ 
gressors  are  to  be  divided  for  ordering 
within  groups.  As  with  IVOR,  grouping  is 
based  on  the  order  of  input  and  generation  of 
the  regressors. 

(MB  <  25) 

1-2 

12 

LOT(l) 

Number  of  regressors  in  the  first  group  (last 
group  to  be  ranked). 

3-5 

13 

LOT(2) 

Number  of  regressors  in  the  second  group 
(next  to  last  group  to  be  ranked). 

6-8 

13 

LOT(25) 

Number  of  regressors  in  the  twenty-fifth 
group  (first  group  to  be  ranked). 

75-77 

13 

Record  type  6  is  used  to  designate  input  points  for  which  predictions  and  prediction  stan¬ 
dard  deviations  will  be  computed.  Entries  refer  to  the  points  according  to  their  order  of 
input  and  must  be  in  numerically  ascending  order.  If  no  input  points  are  to  be  selected  this 
record  must  be  omitted. 


6 

IKEEPR 

(1) 

Number  corresponding  to  the  input  order  of 
the  first  selected  input  point. 

1-4 

14 

IKEEPR 

(2) 

Number  corresponding  to  the  input  order  of 
the  second  selected  input  point. 

5-8 

14 

IKEEPR 

(20) 

Number  corresponding  to  the  input  order  of 
the  twentieth  selected  input  point. 

77-80 

14 

Additional  records  are  needed  if  more  than  20  points  are  selected. 

Record  type  7  is  used  to  input  synthetic  points  for  which  predictions  and  prediction  standard 
deviations  will  be  computed.  If  no  synthetic  design  points  are  to  be  used,  this  record  must 
be  omitted.  The  format  specification  for  reading  the  coordinates  of  the  data  points  is  also 
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used  to  read  the  synthetic  points.  Therefore,  a  record  for  a  synthetic  design  point  should 
be  identical  to  a  record  for  a  T  ta  point  except  that  the  field  for  the  response  variable  should 
be  blank. 

Record  type  8  is  used  to  input  the  coordinates  of  the  data  points.  The  coordinates  are  en¬ 
tered  according  to  the  format  given  on  record  2  in  columns  41-80.  This  format  applies 
starting  in  column  2  of  the  data  point  input  records.  The  coordinate  of  the  response  variable 
must  be  entered  as  the  first  coordinate  for  each  data  point.  Maximum  number  of  data  points 
is  9500. 

9  MI  =99,  indicating  termination  of  input  for  data  1-2  12 

point  coordinates. 

Record  type  10  is  used  to  designate  rerun  models.  Each  rerun  must  be  indicated  by  a 
separate  record.  If  no  reruns  are  desired,  this  record  must  be  omitted. 


LOT(l) 

=0, 

include  constant  in  rerun. 

1 

11 

=1, 

exclude  constant  from  rerun. 

LOT(2) 

=0, 

include  first  regressor  in  model. 

2 

11 

=1, 

exclude  first  regressor  from  model. 

LOT(51) 

=0, 

include  fiftieth  regressor  in  model. 

51 

11 

=1, 

exclude  fiftieth  regressor  from  model. 
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WEPORU 


PURPOSE 

Program  WEPORU  (Uncorrelated  Weighted  Polynomial  Regression)  is  designed  to  fit 
a  weighted  curvilinear  model  with  one  independent  variable  to  a  set  of  observations. 
Program  WEPORU  handles  the  case  in  which  the  error  terms  (differences  between  the 
observed  and  predicted  values  of  the  dependent  variable)  have  different  variances  but  are 
uncorrelated.  If  the  error  terms  are  correlated  program  WEPORC  should  be  used.  Program 
WEPORU  complements  two  other  regression  programs  in  STATLIB,  namely,  GE>4REG 
and  DAMRCA.  These  programs  are  based  on  the  general  linear  model  and  assume  that  the 
error  terms  have  the  same  variance  for  all  observations  and  are  uncorrelated. 


FEATURES 

WEPORU  features  can  be  classified  as  either  standard  for  all  runs  or  optional  based  on  user 
preference.  Standard  features  in  WEPORU  include  the  following: 

*  Fitted  equation  showing  the  estimated  regression  coefficients. 

*  Two  ANOVA  (Analysis  sf  Variance)  tables:  one  table  with  a  test  for  regression 
significance  showing  the  contribution  made  by  each  term  in  the  model  to  the 
overall  regression  sum  of  squares,  and  the  other  table  displaying  a  test  for  lack  of 
fit  (model  adec  uacy). 

*  A  table  displaying  the  raw  input  data,  the  estimated  (predicted)  values  for  the 
dependent  variable,  and  the  residuals. 

Optional  features  include  the  following: 

*  Confidence  limits  on  the  expected  response  for  specific  values  of  the  regressor. 

*  Prediction  limits  on  the  response  for  specific  values  of  the  regressor. 

These  values  of  the  regressor  may  be  either  its  original  levels  (input  points)  or  up 
to  100  synthetic  points  (points  which  are  not  contained  in  the  input  data). 

The  user  specifies  the  degree  of  the  polynomial  fit  desired,  not  to  exceed  the  minimum 
of  the  number  of  observations  minus  two,  or  10.  The  number  of  observations  processed  by 
WEPORU  ranges  from  3  to  750.  The  user  must  also  specify  an  array  of  “weights”,  one 
weight  associated  with  each  distinct  level  of  the  independent  variable.  For  each  level  the 
weight  is  usually  chosen  to  be  the  reciprocal  of  the  variance  of  the  response  variable  at  that 
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level.  If  these  variances  are  unknown  they  must  be  estimated  from  current  and/or  past  data. 
The  user  is  referred  to  Reference  3  for  a  detailed  discussion  of  WEPORU  (called  WEPOR 
in  Reference  3)  and  to  References  1  and  2  for  background  information  on  weighted  re¬ 
gression. 


REFERENCES 

1.  Draper,  N.  R.  and  Smith,  H.  (1981),  Applied  Regression  Analysis,  Second  Edition, 
John  Wiley  &  Sons,  Inc.,  pp.  108  -  116. 

2.  Myers,  R.  H.  (1986),  Classical  and  Modem  Regression  with  Applications,  Duxbury 
Press,  pp.  168  -  177. 

3.  Shields,  P.  A.  and  Thomas,  M.  A.  (1982),  WEPOR:  A  Weighted  Polynomial  Re¬ 
gression  Program,  NSWC  TR  82-49,  NSWC,  Dahlgren,  VIRGINIA  22448. 


INPUT  GUIDE 

The  user  must  create  a  data  file  containing  the  following  record  types.  Record  type  1 
specifies  a  title  for  problem  description.  Record  type  2  specifies  the  number  of  observa¬ 
tions,  the  desired  degree  of  the  polynomial  model,  and  the  value  of  an  option  variable 
dictating  whether  or  not  confidence/prediction  limits  are  desired.  The  third  record  type  is 
included  only  if  confidence/prediction  limits  have  been  requested.  This  record  specifies  the 
number  of  synthetic  points  at  which  these  limits  are  to  be  computed,  a  number  between  0 
and  1  representing  one  minus  the  confidence  level  associated  with  these  limits,  and  the 
number  of  future  observations  on  which  the  prediction  limits  are  to  be  based.  Record  type 
4  contains  the  format  by  which  the  data,  the  independent  and  dependent  variable  values, 
are  to  be  read.  Record  type  5  contains  the  format  under  which  the  “weights”  are  to  be  read. 
Record  type  6  specifies  the  values  of  the  weights  themselves.  Record  type  7  contains  the 
input  data  in  pairs  -  first  the  independent  variable  value,  then  the  dependent  variable  value. 
This  record  is  repeated  as  often  as  necessary.  Record  type  8  is  included  only  if  synthetic 
points  are  to  be  specified.  This  record  contains  the  values  of  these  synthetic  points. 

Some  changes  have  been  made  to  the  original  program  since  the  date  of  Reference  3. 
For  this  reason  the  input  guide  specified  below  should  take  precedence  over  the  one  given 
in  Reference  3. 

Record 

Type  Variable  _ Description _  Columns  Format 

1  ITITLE  Problem  description.  1-72  9A8 
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2 

NOBS 

Number  of  data  points 

(3  <  NOBS  <750) 

1-5 

15 

KM  AX 

Desired  degree  of  polynomial  model 

(1  <  KMAX  <  min (10,  NOBS  -  2)) 

6-10 

15 

COPT 

Confidence/prediction  limit  option. 

11-15 

15 

=0,  no  intervals 

=1,  confidence  intervals  only 

=2,  confidence  and  prediction  intervals 


Record  type  3  is  included  only  if  COPT  =  1  or  2. 


3 

NPTS 

Number  of  synthetic  points  for  confi¬ 
dence/prediction  limits.  Set  NPTS  =  0  to  use 
input  points. 

1-5 

15 

(NPTS  <  100) 

AR 

AR  =  1  -  Y  for  100y  percent  limits. 

6-10 

F5.2 

(0  <  AR  <  1.0) 

M 

Number  of  future  observations  on  which  the 
prediction  limits  are  based.  (DEFAULT=1) 

10-15 

15 

4 

FORM1 

Format  used  to  read  in  the  (independent, 
dependent)  pairs  and  the  synthetic  points 

1-80 

8A10 

5 

FORM2 

Format  used  to  read  in  the  “weights” 

1-80 

8A10 

6 

W 

Array  of  weights 

FORM2 

7 

X 

Independent  variable  level 

FORM1 

Y 

Dependent  variable  observation 

FORM1 

Record  7  is 

repeated  as  often  as  necessary. 

Record  type  8  is  included  only  if  COPT  =  1  or  2  and  NPTS  >  0. 

8  XPTS  Synthetic  point  values  FORM1 
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WEPORC 


PURPOSE 

Program  WEPORC  (Correlated  Weighted  Polynomial  Regression)  is  designed  to  fit  a 
weighted  curvilinear  model  with  one  independent  variable  to  a  set  of  observations.  Pro¬ 
gram  WEPORC  handles  the  case  in  which  the  error  terms  (differences  between  the 
observed  and  predicted  values  of  the  dependent  variable)  are  correlated.  If  the  error  terms 
are  uncorrelated  with  unequal  variances  program  WEPORU  should  be  used.  Program 
WEPORC  complements  two  other  regression  programs  in  STATLIB,  namely,  GEMREG 
and  DAMRCA.  These  programs  are  based  on  the  general  linear  model  and  assume  that  the 
error  terms  have  the  same  variance  for  all  observations  and  are  uncorrelated. 


FEATURES 

WEPORC  features  can  be  classified  as  either  standard  for  all  runs  or  optional  based  on 
user  preference.  Standard  features  in  WEPORC  include  the  following: 

*  Fitted  equation  showing  the  estimated  regression  coefficients. 

*  Two  ANOVA  (Analysis  ef  Variance)  tables:  one  table  with  a  test  for  regression 
significance  showing  the  contribution  made  by  each  term  in  the  model  to  the 
overall  regression  sum  of  squares,  and  the  other  table  displaying  a  test  for  lack  of 
fit  (model  adequacy). 

*  A  table  displaying  the  raw  input  data,  the  estimated  (predicted)  values  for  the 
dependent  variable,  and  the  residuals. 

*  The  lower  triangular  portion  of  the  weighting  matrix. 

Optional  features  include  the  following: 

*  Confidence  limits  on  the  expected  response  for  specific  values  of  the  regressor. 

*  Prediction  limits  on  the  response  for  specific  values  of  the  regressor. 

These  values  of  the  regressor  may  be  either  its  original  levels  (input  points)  or  up 
to  100  synthetic  points  (points  which  are  not  contained  in  the  input  data). 

The  user  specifies  the  degree  of  the  polynomial  fit  desired,  not  to  exceed  the  minimum 
of  the  number  of  observations  minus  two,  or  10.  The  number  of  observations  processed 
by  WEPORC  ranges  from  3  to  100.  The  user  must  provide  a  variance-covariance  matrix 
for  the  error  terms  (i.e,  variances  along  the  diagonal  and  covariances  on  the  off-diagonals). 
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If  the  elements  of  this  matrix  are  unknown,  they  must  be  estimated  from  current  and/or  past 
data.  A  weighting  matrix  is  computed  in  WEPORC  from  the  input  variance-covariance 
matrix.  The  user  is  referred  to  Reference  3  for  a  detailed  discussion  of  WEPORC  (called 
WEPOR2  in  Reference  3)  and  to  References  1  and  2  for  background  information  on 
weighted  regression. 

REFERENCES 

1.  Draper,  N.  R.  and  Smith,  H.  (1981),  Applied  Regression  Analysis,  Second  Edition, 
John  Wiley  &  Sons,  Inc.,  pp.  108  -  116. 

2.  Myers,  R.  H.  (1986),  Classical  and  Modem  Regression  with  Applications,  Duxbury 
Press,  pp.  168  -  177. 

3.  Shields,  P.  A.  and  Thomas,  M.  A.  (1982),  WEPOR:  A  Weighted  Polynomial  Re¬ 
gression  Program,  NSWC  TR  82-49,  NSWC,  Dahlgren,  VIRGINIA  22448. 


INPUT  GUIDE 

The  user  must  create  a  data  file  containing  the  following  record  types.  The  first  record 
type  specifies  a  title  for  problem  description.  The  second  record  type  specifies  the  number 
of  observations,  the  desired  degree  of  the  polynomial  model,  and  the  value  of  an  option 
variable  dictating  whether  or  not  confidence/prediction  limits  are  desired.  The  third  record 
type  is  included  only  if  confidence/prediction  limits  have  been  requested.  This  record 
specifies  the  number  of  synthetic  points  at  which  these  limits  are  to  be  computed,  a  number 
between  0  and  1  representing  one  minus  the  confidence  level  associated  with  these  limits, 
and  the  number  of  future  observations  on  which  the  prediction  limits  are  to  be  based. 
Record  type  4  contains  the  format  by  which  the  data,  the  independent  and  dependent 
variable  values,  are  to  be  read.  Record  type  5  contains  the  format  under  which  the  elements 
of  the  variance-  covariance  matrix  are  to  be  read.  Record  type  6  specifies  the  values  of  the 
matrix  elements  themselves.  Record  type  7  contains  the  input  data  in  pairs  -  first  the  in¬ 
dependent  variable  value,  then  the  dependent  variable  value.  Record  7  is  repeated  as  often 
as  necessary.  Record  type  8  is  included  only  if  synthetic  points  are  to  be  specified.  This 
record  contains  the  values  of  these  synthetic  points. 

Some  changes  have  been  made  to  the  original  program  since  the  date  of  Reference  3. 
For  this  reason  the  input  guide  specified  below  should  take  precedence  over  the  one  given 
in  Reference  3. 

Record 

Type  Variable  _ Description _  Columns  Format 

1  ITITLE  Problem  description  1-72  9A8 


t 


« 


46 


NSWC  TR  89-97 


2 

NOBS 

Number  of  data  points 

(3  <NOBS  <  100) 

1-5 

15 

KMAX 

Desired  degree  of  polynomial  model 

(1  <  KMAX  <  min  (10,  NOBS  -  2)) 

6-10 

15 

COPT 

Confidence/prediction  limit  option. 

11-15 

15 

=0,  no  intervals 

=1,  confidence  intervals  only 

=2,  confidence  and  prediction  intervals 


Record  type  3  is  included  only  if  COPT  =  1  or  2. 


3 

NPTS 

Number  of  synthetic  points  for  confi¬ 
dence/prediction  limits.  Set  NPTS  =  0  to  use 
input  points. 

1-5 

15 

(NPTS  <  100) 

AR 

AR  =  1  -  y  for  100y  percent  limits. 

6-10 

F5.2 

(0  <  AR  <  1 .0) 

M 

Number  of  future  observations  on  which  the 
prediction  limits  are  based.  (DEFAULTS) 

10-15 

15 

4 

FORM1 

Format  used  to  read  in  the  (independent, 
dependent)  pairs  and  the  synthetic  points 

1-80 

8A10 

5 

FORM2 

Format  used  to  read  in  the  variance- 
covariance  matrix  row  by  row 

1-80 

8A10 

6 

V 

Variance-covariance  matrix  elements 

FORM2 

7 

X 

Independent  variable  level 

FORM1 

Y 

Dependent  variable  observation 

FORM1 

Record  7  is  repeated  as  often  as  necessary. 

Record  type  8  is  included  only  if  COPT  =  1  or  2  and  NPTS  >  0. 

8  XPTS  Synthetic  point  values  FORM1 
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MROP 


PURPOSE 

Program  MROP  (Multiple  Regression  Using  Orthogonal  Polynomials)  is  a  multiple 
polynomial  regression  program  which  uses  orthogonal  polynomials  to  estimate  the  re¬ 
gression  coefficients  and  compute  the  regression  sum  of  squares.  It  can  accommodate  as 
many  as  four  original  regressors  (independent  variables)  and  generates  the  powers  and  cross 
product  terms  internally.  The  program  uses  Forsythe’s  recursive  method  for  generating  the 
orthogonal  contrast  coefficients  (Reference  1)  and  is  applicable  to  non-equidistant  levels  of 
the  regressors  and  proportional  frequency  (or  replication)  of  the  response  variable.  (Pro¬ 
portional  replication  is  defined  in  the  COMMENTS  section.)  MROP  was  written  in  the 
mid  60’ s  to  eliminate  the  problem  of  dealing  with  the  large  and/or  ill  conditioned  XU 
matrices  associated  with  multiple  polynomial  regression  models.  References  2  and  3 
provide  the  theoretical  basis  for  the  application  of  orthogonal  polynomials  in  regression, 
and  Reference  4  provides  a  discussion  of  the  use  of  orthogonal  polynomials  in  least  squares 
surface  fitting. 


FEATURES 

Each  MROP  run  contains  a  listing  of  the  input  values  for  the  regressors  and  the  re¬ 
sponse  variable.  Other  features  are  optional  and  controlled  by  the  user  on  record  2.  These 
include  the  following: 

*  Least  squares  estimates  of  the  regression  coefficients. 

*  Single  degree  of  freedom  mean  squares,  tests  of  significance  on  individual  terms, 
and  a  test  for  overall  regression  significance. 

*  Printout  of  the  computed  orthogonal  contrast  coefficients. 

*  An  ANOV A  (Analysis  Qf  Variance^  table  which  treats  each  regressor  as  a  factor 
in  a  factorial  analysis. 

*  Printout  of  the  residuals  and  a  check  on  the  residual  sum  of  squares. 

*  Backward  ranking  of  the  regressors  incorporating  the  admissibility  principle  and 
tests  to  determine  a  statistically  significant  model.  (The  admissibility  principle 
states  that  if  a  model  contains  the  pth  power  of  a  regressor,  it  must  also  contain 
all  powers  from  1  to  p-1.  It  applies  to  powers  of  individual  regressors,  cross 
products  of  regressors,  and  cross  products  of  powers  of  regressors.) 

*  Standard  deviations  of  the  mean  response  and  future  response  for  specified  values 
of  the  regressors  (synthetic  points)  for  use  in  confidence  and  prediction  limit 
calculations. 
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*  Hand  selected  reruns  to  enable  the  user  to  generate  models  which  contain  subsets 
of  the  terms  in  the  main  run. 


REFERENCES 

1.  Forsythe,  George  E.  (1957),  “Generation  and  Use  of  Orthogonal  Polynomials  for 
Data-Fitting  with  a  Digital  Computer”,  J.  Soc.  Indust.  Appl.  Math.,  5,  pp.  75-88. 

2.  Kendall,  M.  G.  and  Stuart,  A.  (1961),  The  Advanced  Theory  of  Statistics,  Vol.  2, 
Hafner  Publishing  Company,  pp.  356-361. 

3.  Montgomery,  Douglas  C.  and  Peck,  Elizabeth  A.  (1982),  Introduction  to  Linear 
Regression  Analysis,  John  Wiley  and  Sons,  Inc.,  pp.  244-286. 

4.  Thomas,  M.  A.  (1966),  The  Use  of  Orthogonal  Polynomials  in  Least  Squares  Sur¬ 
face  Fitting  over  Rectangular  Grids,  NWL  Technical  Memorandum  K-l/66,  NSWC, 
Dahlgren,  VIRGINIA  22448. 


INPUT  GUIDE 

The  user-created  input  file  consists  of  as  many  as  eleven  different  records.  The  first 
is  simply  a  run  identification  record,  and  the  second  and  third  contain  the  number  of  vari¬ 
ables  and  replicates  and  the  optional  feature  control  variables.  Records  4  and  5  contain  the 
formats  for  reading  the  input  data,  6  contains  the  regressor  values,  and  7A  and  7B  contain 
the  response  values.  The  remaining  records  contain  synthetic  points  and  the  rerun  speci¬ 
fications  if  indicated  on  record  2. 


Record 

Type 

Variable 

Description 

Columns 

Format 

1 

ID 

Problem  description. 

1-72 

9A8 

2 

NIV 

Number  of  original  regressors. 

1-5 

15 

N1 

Highest  degree  to  be  generated  for  the  first 
regressor. 

6-10 

15 

N2 

Highest  degree  to  be  generated  for  the  sec¬ 
ond  regressor. 

11-15 

15 

N3 

Highest  degree  to  be  generated  for  the  third 
regressor. 

16-20 

15 

N4 

Highest  degree  to  be  generated  for  the  fourth 
regressor. 

21-25 

15 

LN1 

Number  of  levels  for  the  first  regressor. 

26-30 

15 
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LN2 

Number  of  levels  for  the  second  regressor. 

31-35 

15 

LN3 

Number  of  levels  for  the  third  regressor. 

36-40 

15 

LN4 

Number  of  levels  for  the  fourth  regressor. 

41-45 

15 

INDREP 

Number  of  equal  replications  per  cell.  If 
unequal,  i.e.,  proportional,  use  a  blank  or 
zero. 

46-47 

12 

KIND 

=1,  MSE  used  to  estimate  a. 

=2,  pooled  MSE  used  to  estimate  o. 
(Used  in  the  formation  of  confidence 
and  prediction  limits.) 

48-49 

12 

TEST 

=0  or  blank,  testing  requested. 

>  0,  no  testing  requested.  (Testing  asso¬ 
ciated  with  model  significance.) 

50-51 

12 

ISD 

=0,  no  printout  of  orthogonal  contrast 
coefficients. 

>  0,  printout  of  orthogonal  contrast  coef¬ 
ficients.  (Ordered  on  increasing 
degree  within  increasing  level.) 

52-53 

12 

NSYNPT 

Number  of  synthetic  points.  (For  confidence 
(prediction)  limits-  on  the  mean  (future)  re¬ 
sponse.) 

54-58 

15 

ALF 

Significance  level  for  testing. 

59-70 

F12.ll 

NEQ 

Number  of  hand  selected  reruns. 

71-75 

15 

ITAB1 

=0  =1  =0  =1 

79 

11 

IFAC 

=0  =0  =1  =1 

Table  2  Tables  Table  1  Table  1 

only.  1  and  2.  only.  only. 

(  Table  1  =  Full  factorial  analysis  ANOVA. 
Table  2  =  Regression  analysis  ANOVA 
which  results  from  ranking.) 

80 

11 

Record  3  is  not  needed  for  the  case  of  a  single  regressor.  It  is  required  for  NIV= 
for  equal  or  unequal  replication. 

-2,  3,  or 

3  NREP 

(1,1) 

Marginal  total  number  of  observations  for 
level  1  of  regressor  1. 

1-5 

15 

NREP 

(2,1) 

Marginal  total  number  of  observations  for 
level  2  of  regressor  1 . 

6-10 

15 
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NREP  Marginal  total  number  of  observations  for  ....  15 

(LN1,1)  level  LN1  of  regressor  1. 

Repeat  for  regressors  2,  3,  and  4.  Each  regressor  defines  a  set,  and  each  set 
begins  on  a  new  record. 


4 

FMT1 

Format  for  reading  the  values  of  the  regres¬ 
sors  and  synthetic  points. 

1-80 

10A8 

5 

FMT2 

Format  for  reading  the  values  of  the  re¬ 
sponse  variable. 

1-80 

10A8 

6 

X(I,1) 

1=1,  2,...,  LN1.  the  LN1  levels  of  the  first 
regressor. 

.... 

FMT1 

X(I,2) 

1=1,  2,...,  LN2.  The  LN2  levels  of  the 
second  regressor. 

.... 

FMT1 

X(I,3) 

1=1,  2,...,  LN3.  The  LN3  levels  of  the  third 
regressor. 

.... 

FMT1 

X(I,4) 

1=1,  2,...,  LN4.  The  LN4  levels  of  the 
fourth  regressor. 

.... 

FMT1 

Each  regressor  forms  a  set,  and  each  set  begins  on  a  new  record. 

Record  7 A  (15)  contains  the  number  of  cell  values  or  replicates  to  be  input  on  record  7B 
(FMT2).  (This  record  must  be  omitted  for  the  case  of  equal  replication,  i.e.,  INDREP  >  0.) 
Record  7B  contains  all  the  replicates  for  that  cell.  Records  7A  and  7B  are  paired  for  each 
cell,  and  the  order  of  these  pairs  is  implied  by  the  program.  It  must  adhere  to  the  following 
scheme: 

Lay  out  the  data  in  a  multi-way  table,  and  let  cell(i,j,k,l)  designate  the  cell 
containing  the  response  values  for  the 

ith  level  of  regressor  1, 
jth  level  of  regressor  2, 
kth  level  of  regressor  3, 

1th  level  of  regressor  4. 

With  this  arrangement,  the  order  of  the  input  for  the  response  values  is  based 
on  the  rules  below. 

*  Data  input  begins  with  cell(i,j,k,l)  =  (1,1, 1,1). 
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*  Index  i  increases  with  each  cell  change  until  it  reaches  its  maximum 
value  (LN1).  The  cycle  is  repeated  as  often  as  needed. 

*  Index  j  increases  with  each  index  i  cycle  change.  The  cycle  is  repeated 
as  often  as  needed. 

*  Index  k  increases  with  each  index  j  cycle  change.  The  cycle  is  repeated 
as  often  as  needed. 

*  Index  1  increases  with  each  index  k  cycle  change. 

An  example  will  be  given  in  the  COMMENTS  section. 

Record  8  is  used  only  if  confidence  (prediction)  limits  on  the  expected  (future)  response  for 
specific  values  of  the  regressors  (synthetic  points)  are  requested  on  record  2.  Record  8 
contains  the  synthetic  points  in  the  same  format  used  to  read  in  the  regressors. 

8  XS(1,(I))  The  first  synthetic  point  where  (I)  is  the  ....  FMT1 

NIV-tuple  specifying  the  value  of  each  re¬ 
gressor  in  their  natural  order,  i.e.,  1, 2, 3,  and 
4.  (Confusion  can  be  avoided  by  not  over- 
specifying  FMT1.) 


XS(NSYNPT,(I))  The  last  synthetic  point.  ...  FMT1 

Records  9  and  10  are  the  hand  selected  rerun  records  and  are  used  only  when  reruns  are 
indicated  on  record  2  (NEQ  >  0).  Record  9  contains  the  number  of  terms  to  be  deleted 
from  the  original  (full)  model.  Record  10  contains  the  description  of  the  terms  to  be  de¬ 
leted.  Records  9  and  10  are  paired  for  each  rerun. 


9 

NTERM  Number  of  terms  to  be  deleted  from  the  full 
model. 

1-5 

15 

10 

IN  IN  is  dimensioned  according  to  the  number 

of  regressors  in  the  full  model.  These  inte- 

1-8 

412 

gers  specify  the  powers  of  each  regressor  or 
combination  of  regressors  to  be  deleted. 
Example:  To  delete  X2,  insert  bbblbbbb 
where  b  designates  a  blank.  To  delete 
X2  •  Xj,  insert  bbb2b4bb. 
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COMMENTS 

MROP  was  written  in  the  pre-“computer  friendly”  era.  However,  the  data  input  need 
not  be  confusing  if  one  first  lays  out  the  data  in  a  multi-way  table.  This  makes  it  easy  to 
ascertain  the  correct  input  order  and  to  obtain  the  information  for  Record  3  regarding 
marginal  replication  totals.  The  following  example  with  3  regressors  and  20  responses 
illustrates  the  process: 


Xl3  =  60. 


*„  =  100. 

X2l  =  200. 

Xn  =  21. 

cell  (1,1,1) 

cell  (2,1,1) 

10.5 

8.4 

7.6 

8.6 

9.1 

6.2 

P 

ii 

1° 

cell  (1,2,1) 

cell  (2,2,1) 

16.9 

10.8 

*32  =  48. 

cell  (1,3,1) 

cell  (2,3,1) 

6.1 

8.5 

*23  =  39. 

Xu  =  100. 

*21  =  200. 

>< 

(O 

II 

to 

cell  (1,1,2) 

cell  (2,1,2) 

15.3 

20.3 

17.6 

12.6 

14.7 

16.3 

ts 

Tf 

II 

cell  (1,2,2) 

cell  (2,2,2) 

25.3 

22.7 

*32  =  48. 

cell  (1,3,2) 

cell  (2,3,2) 

8.9 

12.4 

Applying  the  rules  for  records  7A  and  7B,  the  order  of  input  would  be  (1,1,1),  (2,1,1), 
(1,2,1),  (2,2,1),  (1,3,1),  (2,3,1),  (1,1,2),  (2,1,2),  (1,2,2),  (2,2,2),  (1,3,2),  (2,3,2).  Hence,  the 
first  three  cycles  of  records  7A  and  7B  would  be  the  following:  the  integer  3  in  format  15 

for  7A  and  10.5, 

7.6,  9.1  in  format  FMT2  for  7B,  the  integer  3  in  format  15  for  7A  and 

8.4, 8.6, 6.2  in  format  FMT2  for  7B,  and  the  integer  1  in  format  15  for  7A  and  16.9  in  format 

FMT2  for  7B. 
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It  will  also  be  instructive  to  define  proportional  frequency  or  replication  and  apply  it 
to  the  above  example.  This  term  describes  a  multivariate  data  set  in  which  the  marginal 
totals  of  the  number  of  responses  determines  the  number  of  responses  in  any  particular  cell. 
For  the  case  of  3  regressors, 

«.,*  =  [('*.)  (nj.)(n.k)\/(nf 

where  nijk  is  the  number  of  responses  for  the  ith  level  of  regressor  1,  jth  level  of  regressor 

2,  and  kth  level  of  regressor  3.  The  n, . n.r,  and  n..k  are  the  marginal  total  numbers  of 

responses  summed  over  the  dotted  subscripts,  i.e., 

II  nlJk 
j  * 

n.j=  1 1  nijk 

i  k 

n..k=12niJk 
*  j 

Applying  this  definition  to  the  above  example  yields 

Hj.^10  n2..=  10 

0^  =  12  n.2.-A  n.3.  =  4 

n..,  =  10  n.2=10. 

The  marginal  total  numbers  of  observations  must  be  determined  for  each  case  to  be  run  and 
input  on  record  3.  If  the  number  of  replications  is  not  proportional,  the  output  from  the 
program  will  be  erroneous. 
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CANON 


PURPOSE 

Program  CANON  performs  a  canonical  analysis  for  second  order  response  surface 
models.  Canonical  analysis  is  used  as  a  tool  in  response  surface  methodology  to  transform 
the  estimated  response  function  into  canonical  form.  The  canonical  form  allows  the  analyst 
to  more  easily  interpret  the  estimated  relationship  between  the  regressor  variables  and  the 
response  variable.  The  program  can  be  run  using  two  different  forms  of  input.  The  input 
may  consist  of  the  coefficients  in  a  second  order  response  function,  in  which  case  the  usual 
canonical  analysis  is  performed.  An  alternative  is  to  input  the  observed  data  matrix  (design 
matrix  and  responses).  In  this  event  the  response  function  coefficients,  an  analysis  of 
variance  table  and  tests  of  significance  for  regression  and  lack  of  fit  are  computed  in  ad¬ 
dition  to  the  usual  canonical  analysis.  Reference  1  provides  a  good  discussion  of  canonical 
analysis  and  several  numerical  examples. 


FEATURES 

The  standard  output  produced  by  the  program  under  either  input  option  includes: 

*  A  square  matrix  B  which  contains  the  coefficients  from  the  quadratic  form  of  the 
estimated  response  function. 

*  The  characteristic  roots  of  matrix  B  and  the  estimated  response  function  in  ca¬ 
nonical  form. 

*  The  location  of  the  stationary  point  in  regressor-space. 

*  The  estimated  response  at  the  stationary  point  and  a  designation  of  local  maxi¬ 
mum,  local  minimum  or  saddle  point. 

*  A  square  matrix  M  which  enables  the  analyst  to  relate  the  original  regressors  to 
the  new  canonical  variables. 

The  additional  output  produced  when  the  full  data  matrix  is  input  includes: 

*  The  coefficients  of  the  second  order  response  function. 

*  An  analysis  of  variance  table  including  significance  tests  for  regression  and  lack 
of  fit  from  the  second  order  model. 
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REFERENCE 

1.  Myers,  R.  H.  (1971),  Response  Surface  Methodology,  Allyn  and  Bacon,  Inc.,  pp. 
67-88. 


INPUT  GUIDE 


The  specifications  of  the  user-created  input  file  are  given  below.  Record  types  1  and 
2  are  mandatory.  Record  types  3  and  4  are  necessary  if  the  data  matrix  input  option  is 
chosen.  Record  type  5  is  necessary  if  the  second  order  response  function  coefficients  are 
input. 


Record 

Type .  Variable 
1  IOP1 


NO 


2  FORM 


3  P 


Description 

Columns 

Format 

=1,  input  data  matrix. 

=2,  input  coefficients  of  second  order  re¬ 

1-5 

15 

sponse  function. 

Number  of  original  regressors,  i.e.,  exclud¬ 
ing  squared  and  cross-product  terms. 

6-10 

15 

(NO  <  10) 

Format  (in  parentheses)  for  reading  data 
matrix  or  coefficients  from  record  type  4.  If 
IOPl=l,  one  data  point  consisting  of  NO+1 
values  will  be  input  per  record  according  to 
FORM.  If  IOPl=2,  two  identifying  integers 
and  one  coefficient  will  be  input  per  record. 
For  this  case  FORM  must  include  an  I- 
format  to  read  two  identifying  integers. 

1-80 

8A10 

Number  of  data  points.  Omit  this  record  if 
IOPl=2. 

1-5 

15 

If  IOPl  =  l: 


(P  <  200) 


4 


X(j),  Y  Values  of  regressor  variables  and  response  1-80  FORM 
j=l,..,NO  variable  according  to  FORM.  Repeat  as 
needed. 
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If  IOPl=2: 

5  I1,I2,C  Values  of  two  coefficient  identifiers  and  1-80  FORM 

coefficient.  Il=i  and  I2=j  represent  sub¬ 
scripts  of  regressors  comprising  product 
terms,  X^Xjy  in  the  response  function.  X0  is 
defined  equal  to  1.  (Note:  This  notation 
differs  from  the  convention  of  the  other  re¬ 
gression  programs  in  the  library.)  The 
following  example  is  given  for  a  model  with 
two  regressors: 

II  12 

0  0 

1  0 

1  1 

2  0 

2  2 

1  2 
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DURBWAT 


PURPOSE 

Program  DURBWAT  performs  a  Durbin-Watson  2-tailed  test  of  hypothesis  on  the 
residuals  from  a  regression  fit.  The  Durbin-Watson  statistic  is  designed  to  test  for  the 
independence  of  the  error  terms  in  the  regression  model  by  testing  whether  the  first-order 
autocorrelation  of  the  residual  series  is  zero.  This  test  can  reveal  the  undesirable  presence 
of  a  first-order  autoregressive  autocorrelation  structure  between  residuals.  For  a  detailed 
discussion  of  the  Durbin-Watson  statistic,  see  References  1  and  2. 


FEATURES 

Built-in  tables  limit  the  number  of  residuals  (N)  which  can  be  handled  by  DURBWAT 
to  the  range  15  to  100,  inclusive.  The  regression  model  from  which  the  residuals  came  can 
have  up  to  5  independent  variables  (K).  Allowable  significance  levels  (ALPHA)  for  the 
hypothesis  test  are  0.10,  0.05,  and  0.02. 

The  values  of  N,  K,  and  ALPHA  are  displayed  in  the  output.  DURBWAT  computes 
and  prints  out  the  following  additional  quantities: 

*  The  value  of  the  Durbin-Watson  test  statistic  (DWSTAT) 

*  Values  of  an  upper  (DU)  and  lower  (DL)  bound 

*  Values  of  (4  -  DWSTAT)  and  (4  -  DU) 

These  quantities  are  used  as  follows  to  determine  and  print  out  the  result  of  the 
Durbin-Watson  test: 

(1)  If  DWSTAT  is  less  than  DL  qt  if  (4  -  DWSTAT)  is  less  than  DL,  the  test  is 
significant  at  the  chosen  ALPHA  level. 

(2)  If  DU  is  less  than  DWSTAT  and  DWSTAT  is  less  than  (4  -  DU),  the  test  is 
non-significant  at  the  chosen  ALPHA  level. 

(3)  Failing  (1)  or  (2),  the  test  is  inconclusive  at  level  ALPHA. 

REFERENCES 

1.  Durbin,  J.  R.  and  Watson,  G.  S.  (1950, 1951,  1971),  “Testing  for  Serial  Correlation 
in  Least  Squares  Regression”,  Parts  1  -  3,  Biometrika  37  (1950):  pp.  409  -  428;  38 
(1951):  pp.  159  -  178;  58  (1971):  pp.  1  -  20. 
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2.  Montgomery,  Douglas  C.  and  Peck,  Elizabeth  A.  (1982),  Introduction  to  Linear 
Regression  Analysis ,  John  Wiley  &  Sons,  Inc.,  pp.  349  -  353. 


INPUT  GUIDE 

The  user  must  first  obtain  one  or  more  sets  of  residuals  from  regression  fits  in  which 
the  response  variable  has  been  observed  sequentially  in  time.  The  user  must  then  create  a 
data  file  containing  the  following  record  types.  The  first  record  type  specifies  the  number 
of  cases  (ICASES)  to  be  run  (each  case  requiring  a  separate  set  of  residuals).  The  second 
record  type  specifies  the  number  (K)  of  independent  variables  (IV’s)  upon  which  the  re¬ 
gression  was  based,  the  number  of  data  points  (N),  and  the  significance  level  (ALPHA) 
chosen  for  the  2-tailed  test.  Next,  the  set  of  time-ordered  residuals  follows  on  record  type 
3,  one  residual  per  record.  If  ICASES  is  greater  than  one,  record  types  2  and  3  must  be 
repeated  for  each  case. 

More  specifically  the  required  data  file  is  constructed  as  follows: 


Record 

Tvpe 

Variable 

Description 

Columns 

Format 

1 

ICASES 

Number  of  cases 

1-5 

15 

2 

K 

Number  of  IV’s 

1-5 

15 

N 

Number  of  residuals 

6-10 

15 

ALPHA 

Significance  level 

16-20 

F5.0 

3 

Z(l) 

1st  residual 

1-20 

F20.0 

Z(2) 

2nd  residual 

1-20 

F20.0 

Z(N) 

Nth  residual 

1-20 

F20.0 
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NEARNEB 


PURPOSE 

Program  NEARNEB  (Near  Neighbor)  performs  an  interior  analysis  of  the  observations 
in  a  multiple  linear  regression  problem.  Estimates  of  the  coefficients  in  the  regression 
model  are  required  as  input  to  the  program  in  addition  to  the  data  points. 

The  analysis  consists  of  two  procedures.  The  first  procedure  identifies  remotely  lo¬ 
cated  data  points  in  the  x-space  (or  regressor  space)  which  exert  considerable  influence  on 
the  least  squares  computations.  The  criterion  used  to  identify  the  remote  points  is  the  value 
of  a  statistic  designated,  WSSDJ,  which  was  proposed  in  Reference  1.  WSSDJ  is  the 
weighted  sum  of  the  squared  standardized  distance  of  point  j  from  the  centroid  of  the  x- 
space.  The  effect  of  influential  points  can  then  be  examined  by  the  analyst  after  refitting 
the  regression  model  without  these  points  and  observing  any  differences  between  the  old 
and  new  estimates  of  model  coefficients  and  summary  statistics. 

The  second  procedure  identifies  pairs  of  data  points  which  are  designated  as  “near 
neighbors”  in  the  x-space.  The  criterion  used  to  identify  “near  neighbors”  is  the  value  of 
a  statistic  designated,  WSSD,  which  was  also  proposed  in  Reference  1.  WSSD  is  the 
weighted  squared  standardized  distance  in  x-space  between  two  points.  These  identified 
pairs  of  points  are  used  by  the  program  to  compute  an  estimate  of  the  variance  for  pure 
error.  This  estimate  can  be  compared  to  the  residual  mean  square  from  the  regression 
analysis  to  ascertain  if  significant  lack  of  fit  exists. 

The  program  was  obtained  from  Reference  3.  Examples  and  discussion  of  the  appli¬ 
cation  of  the  procedures  are  given  in  Reference  2. 


FEATURES 

The  program  will  perform  both  procedures  together  or  each  procedure  singly,  at  the 
analyst’s  option. 

Output  for  the  procedure  to  detect  remote  observations  is  a  list  ordered  by  ascending 
value  of  WSSDJ  and  consisting  of: 

*  Observation  sequence  number. 

*  Associated  value  of  WSSDJ. 
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Output  for  the  procedure  to  compute  an  estimate  of  the  standard  deviation  of  pure 
error  consists  of  two  lists.  The  first  list  is  ordered  according  to  ascending  pre¬ 
dictions  of  the  dependent  variable  and  consists  of: 

*  Observation  sequence  number. 

*  Predicted  value  of  dependent  variable  and  prediction  error  (residual). 

*  Delta  residual  (difference  between  prediction  errors),  WSSD  value  for  “near 
neighbor”  candidates  and  rank  order  of  the  15  smallest  WSSD  values. 

The  second  list  is  ordered  according  to  ascending  values  of  WSSD  and  consists 
of: 

*  Cumulative  estimates  of  the  standard  deviation  of  pure  error. 

*  WSSD  value. 

*  Observation  sequence  numbers  for  the  observation  pair  associated  with  WSSD 
value. 

*  Delta  residual. 


REFERENCES 

1.  Daniel,  C.  and  Wood,  F.  S.  (1971),  Fitting  Equations  to  Data,  John  Wiley  &  Sons, 
Inc. 

2.  Montgomery,  D.  C.  and  Peck,  E.  A.  (1982),  Introduction  to  Linear  Regression 
Analysis,  John  Wiley  &  Sons,  Inc.,  pp.  154-167. 

3.  Montgomery,  D.  C.,  Mautn,  E.  W.  and  Peck,  E.  A.  (1980),  “Interior  Analysis  of  the 
Observations  in  Multiple  Linear  Regression”,  Quality  Technology,  Vol.  12,  No.  3,  pp. 
165-173. 


INPUT  GUIDE 

The  program  was  written  so  that  the  usual  input  data  file  for  a  multiple  regression 
problem  can  be  easily  modified  to  conform  to  the  input  requirements.  An  input  file  for 
NEARNEB  can  be  created  by  adding  two  records  to  the  beginning  of  a  file  of  multiple 
regression  data  and  one  (or  more  if  needed)  record  at  the  end.  All  records  are  mandatory 
in  the  input  file. 
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Record 

Tvoe 

Variable 

Description 

Columns 

Forniai 

1 

ITYPE 

=1,  detect  remote  observations. 

=2,  estimate  error  from  near  neighbors. 

=3,  do  both  procedures. 

1 

11 

NIX 

=1,  list  all  possible  values  of  WSSDJ. 

(4*NOBS  -  10  values) 

=0,  list  only  first  100  values  of  WSSDJ. 

2 

11 

NIND 

Number  of  regressor  variables  in  the  data 

set. 

(NIND  <  50) 

3-5 

13 

NAIND 

Number  of  regressors  actually  in  model  be¬ 
ing  analyzed.  (Note:  this  parameter  allows 
user  to  exclude  regressors  without  rewriting 
XFORM). 

(NAIND  <  NIND) 

6-8 

13 

NOBS 

Number  of  observations. 

(NOBS  <  200) 

9-11 

13 

NPOSY 

The  column  position  of  the  response  vari¬ 
able  in  the  data  set. 

12-14 

13 

2 

XFORM 

Format  (in  parentheses)  for  reading  regres¬ 
sor  and  response  variables. 

1-80 

20A4 

3 

DAT.YDAT 

Values  of  regressor  and  response  variables 
corresponding  to  XFORM  and  NPOSY. 
Repeat  this  record  as  needed. 

1-80 

XFORM 

4 

COEFF 

Regression  coefficients  in  the  order  b0,  bu 
b2.-..,bNIND-  Enter  0.0  for  coefficients  corre¬ 
sponding  to  regressors  in  the  data  set  but  not 
presently  in  the  model.  Repeat  this  record  as 
needed. 

1-80 

8F10.0 
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GOODNESS  OF  FIT  ANALYSIS 

A  goodness  of  fit  procedure  is  a  statistical  test  of  hypothesis  to  determine  if  a  sampled 
population  has  a  specified  probability  distribution.  In  other  words  and  as  the  name  implies, 
it  is  a  procedure  for  testing  the  “goodness  of  fit”  of  an  observed  to  a  theoretical  distribution. 
A  technique  commonly  used  is  to  construct  a  histogram  from  the  sampled  data  and  compare 
it  visually  with  the  hypothesized  probability  distribution,  f(x).  In  constructing  the  histo¬ 
gram,  one  chooses  subintervals  and  calculates  the  ordinate  for  the  ith  according  to  the 
proportion  of  observations  in  subinterval  or  cell  i.  This  provides  for  a  visual  comparison 
but  suffers  from  the  lack  of  objective  criteria  to  judge  whether  the  data  fits  the  specified 
distribution.  There  are  two  primary  procedures  which  provide  this  objectivity,  and  they  are 
Pearson’s  chi-square  goodness  of  fit  test  and  the  Kolmogorov-Smimov  (  or  K-S  )  test  of 
fit.  Five  of  the  goodness  of  fit  programs  in  STATLEB  are  of  the  former  type  and,  one 
(UNKSGOF)  is  of  the  latter  type.  These  procedures  will  be  discussed  in  the  paragraphs 
below. 

Pearson’s  chi-square  goodness  of  fit  statistic  is  a  measure  of  discrepancy  between  the 
observed  cell  frequency  /  and  the  expected  cell  frequency  e  (under  the  hypothesized  dis¬ 
tribution)  combined  over  all  cells.  The  test  is  based  on  the  fact  that  the  statistic 

C=i(f-e,flei 

i  =  1 

is  approximately  distributed  as  a  chi-square  random  variable  with  v  =  k  -  b  -  1  degrees  of 
freedom.  In  this  expression,  k  is  the  number  of  subintervals  or  cells,  and  b  is  the  number 
of  parameters  in  the  probability  distribution  that  have  to  be  estimated  from  the  data.  To 
ascertain  the  risk  of  falsely  rejecting  the  hypothesized  distribution,  one  compares  the  sta¬ 
tistic  C  with  the  percentiles  of  the  chi-square  distribution  with  v  degrees  of  freedom.  The 
goodness  of  the  approximation  depends  on  the  expectations  e  which  must  not  be  too  small. 
Many  texts  quote  5  as  the  smallest  safe  value  and  recommend  the  combining  of  adjoining 
cells  when  necessary  to  achieve  this  minimum.  However,  Cochran  (References  2  and  3)  is 
much  less  conservative  in  his  statement,  “With  unimodal  distributions,  where  expected 
frequency  will  be  small  in  the  tails,  arrange  matters  so  that  the  minimum  expectation  in 
each  tail  is  at  least  one.”  Most  of  the  programs  in  STATLIB  provide  for  user  control  over 
the  minimum  number.  The  sensitivity  or  power  of  the  test  is  dependent  on  the  sample  size 
n,  the  number  of  cells  k,  and  how  the  cells  have  been  formed.  In  STATLIB,  the  goodness 
of  fit  programs  utilizing  the  chi-square  approach  either  have  cells  with  equidistant  bound¬ 
aries  or  have  cells  with  equal  probability  (and  hence,  with  equal  expected  frequency).  In 
three  of  the  programs,  the  user  can  specify  his  choice.  Choosing  the  equal  expectation 
option  usually  provides  one  with  a  more  sensitive  test. 
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The  Kolmogorov-Smimov  test  of  fit  is  based  on  a  comparison  of  the  sample  cumula¬ 
tive  distribution  function  with  the  cumulative  distribution  function  (cdf)  of  the  hypothesized 
distribution.  The  cdf  of  the  hypothesized  distribution  fix)  is  defined  by 

Jt 

F0(x)  =  J  f(t)dt 

and  the  cdf  of  the  sample  is  defined  by  Fn(x)  =  i/n  when  x  is  the  ith  smallest  value  in  the 
sample;  the  K-S  test  statistic  is  given  by 

Dn  =  max  |  Fn(x)  —  F0(x  )| . 

all* 

To  ascertain  the  risk  of  falsely  rejecting  the  hypothesized  distribution,  one  compares  D„ 
with  tabled  percentiles  of  its  distribution.  The  distribution  of  Dn  is  independent  of  F0 (x) 
provided  the  hypothesized  distribution  is  completely  specified  with  no  unknown  parame¬ 
ters.  If  not,  the  distribution  of  Dn  depends  on  F0(x)  and  the  percentiles  must  be 
approximated  by  Monte  Carlo  sampling  (Reference  6).  In  general,  the  K-S  test  is  more 
sensitive  than  the  chi-square  test  and  can  be  applied  with  fewer  data  points. 

The  user  is  referred  to  References  2,  3,  4,  and  5  for  amplification  on  the  chi-square 
test  and  to  References  1,  4,  and  6  for  amplification  on  the  K-S  test. 


REFERENCES 

1.  Bates,  Carl  B.  and  Orsulak,  Jacqueline  R.  (1968),  A  Computer  Program  for  the 
Kolmogorov  Goodness  of  Fit  Test  for  Normality,  NWL  Technical  Memorandum 
K-2/68,  NSWC,  Dahlgren,  VIRGINIA  22448. 

2.  Cochran,  W.  G.  (1952),  “The  %2  Test  of  Goodness  of  Fit”,  Annals  of  Math.  Statistics, 
23,  p.  315. 

3.  Cochran,  W.  G.  (1954),  “Some  Methods  of  Strengthening  the  Common  yf  Tests”, 
Biometrics,  10,  p.  417. 

4.  Bowker,  Albert  H.  and  Lieberman,  Gerald  J.  (1972),  Engineering  Statistics,  Second 
Edition,  Prentice-Hall,  Inc.,  p.  452. 

5.  Ledermann,  Walter  (1984),  Handbook  of  Applicable  Mathematics,  Volume  VI: 
Statistics,  Part  A,  John  Wiley  and  Sons,  p.  358. 

6.  Lilliefors,  Hubert  W.  (1967),  “On  the  Kolmogorov-Smimov  Test  of  Normality  With 
Mean  and  Variance  Unknown”,  J.  Am.  Stat.  Assoc.,  62,  p.  399. 
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UNORGOF 


PURPOSE 

Program  UNORGOF  (Univariate  Normal  Goodness  Eit)  performs  a  chi-square 
goodness  of  fit  test  of  hypothesis  that  a  random  sample  of  data  is  from  a  univariate  normal 
parent  population.  The  density  function  for  the  normal  distribution  is  given  by 

(*-n)2 

/(JC)=^SC  <*<~ 

where  q  and  c2  are  the  mean  and  variance,  respectively.  The  chi-square  test  statistic  is 
calculated  under  the  assumption  that  the  population  parameters,  q  and  o2,  are  equal  to  the 
sample  statistics, 

lx,  I(x,-x)2 

x=- —  and  s2  =  - - - — , 

n  n  - 1 

respectively,  where  x{  represents  the  i'th  data  point  and  n  is  the  number  of  data  points. 
Initially,  the  program  attempts  to  use  intervals  with  equidistant  boundaries  to  perform  the 
test  statistic  computations.  However,  this  attempt  is  subject  to  the  restriction  that  theoret¬ 
ical  frequencies  for  each  interval  must  equal  or  exceed  a  user  selected  value  (nominally 
equal  to  five).  Adjacent  intervals  are  combined  to  meet  this  criterion  before  any  goodness 
of  fit  calculations  are  done.  A  histogram  of  the  sample  data  is  printed  which  provides  the 
analyst  assistance  in  determining  the  actual  form  of  the  sample  distribution,  should  the 
hypothesis  of  normality  be  rejected.  For  those  situations  where  the  analyst  has  several 
samples,  the  program  contains  a  pooling  option.  This  option  allows  for  testing  the  indi¬ 
vidual  samples  and  also  for  testing  a  pooled  sample  constructed  from  the  individual 
samples.  The  pooled  sample  is  obtained  by  first  subtracting,  for  each  sample,  the  sample 
mean  from  each  data  point  in  the  sample.  These  deviations  from  the  respective  sample 
means  are  then  pooled  for  all  samples,  thereby  creating  a  pooled  sample  which  is  unaffected 
by  any  existing  differences  between  the  sample  means.  This  option  is  especially  appro¬ 
priate  for  testing  normality  assumptions  for  a  statistical  procedure  known  as  analysis  of 
variance. 

Reference  1  contains  additional  information  about  the  program  and  its  application 
under  its  original  name  of  CHITRAN.  Reference  2  contains  a  discussion  of  the  goodness 
of  fit  procedure  and  an  example  of  its  application  to  sample  data. 
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FEATURES 

The  program  allows  the  analyst  to  transform  the  data  prior  to  performing  the  chi-square 
goodness  of  fit  test  by  selecting  from  eleven  transformations. 

Standard  output  for  each  individual  sample  includes: 

*  Mean,  standard  deviation,  range,  maximum  and  minimum  of  the  sample  data. 

*  Histogram  of  the  data  with  chi-square  calculations,  observed  frequency  and 
theoretical  frequency  (under  the  normality  hypothesis)  for  each  interval. 

*  Value  of  the  chi-square  test  statistic  and  the  degrees  of  freedom. 

Optional  output  includes  similar  information  to  that  listed  above  for  the  pooled  sample 
and  a  listing  of  the  pooled  sample  data. 

REFERENCES 

1.  Gemmill,  G.  W„  Herring,  T.  L.  and  Shade  R.  L.  (1967),  CHITRAN  -  A  7030 
(STRETCH)  Computer  Program  for  the  Chi-square  Test  of  Normality,  NWL  TM 
K-2/67,  NSWC,  Dahlgren,  VIRGINIA  22448. 

2.  Wadsworth,  G.  W.  and  Bryan,  J.  G.,  (1974),  Applications  of  Probability  and  Random 
Variables,  McGraw-Hill  Inc.,  pp.  388  -  392. 


INPUT  GUIDE 

The  specifications  of  the  user-created  input  file  are  given  below.  A  more  detailed 
version  is  provided  in  Reference  1 .  However,  because  of  changes  to  the  original  program, 
the  input  guide  in  Reference  1  is  no  longer  appropriate.  Record  types  1,  2,  5,  6  and  7  are 
mandatory.  Record  types  3  and  4  are  required  only  if  the  pooling  option  is  specified. 
Record  type  5  is  used  to  input  sample  specifications;  record  type  6  is  used  for  sample 
identification;  and  record  type  7  is  used  to  input  sample  data.  If  more  than  one  sample  is 
to  be  processed,  a  set  of  record  types  5,  6  and  7  must  be  repeated  for  each  sample. 

Record 

Type  Variable  _ Description _  Columns  Format 

1  FORM  Format  (in  parentheses)  for  reading  sample  data.  1-80  10A8 

2  NOSAM  Number  of  individual  samples.  1-5  15 

(NOSAM  ^  500) 
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GROUP  =0, 

Do  chi-square  test  on  individual  samples  10 

only,  i.e.,  do  not  pool  the  samples. 

11 

=1, 

Do  chi-square  test  on  individual  samples 
and  on  the  pooled  sample. 

=2, 

Do  chi-square  test  on  pooled  sample  only. 

Print  the  observations  comprising  the 
pooled  sample. 

=3, 

Do  chi-square  test  on  pooled  sample  only. 

Do  not  print  the  observations  comprising 
the  pooled  sample. 

NOTRAN 

Number  of  transformations  to  be  performed  on 
sample  data  (including  pooled  sample). 

14-15 

12 

IT(1) 

An  integer  designating  first  transformation  for 
sample  data.  (Refer  to  transformation  code). 

16-20 

15 

IT(2) 

An  integer  designating  second  transformation  for 
sample  data. 

21-25 

15 

IT  (11) 

An  integer  designating  eleventh  transformation 
for  sample  data. 

66-70 

15 

_ Transformation  .Cod£S _ 

=1,  X  =  X  (no  transformation) 

=2,  X  =  In  X 

=3,  X  =  ln(ln  X) 

=4,  X  =  ln(l  +  X) 

=5,  X  =  ln(l  +  ln(l  +  X)) 

=6,  X  =  V* 

=7,  X  =  MX 

=8,  X  =  1  +  1/X 

=9,  X  =  arc  sin  X 

=  10,  X  =  2arcsun/x 

=11,  X  =  arcsinVx 

Record  types  3  and  4  refer  to  the  pooling  option.  If  pooling  will  not  be  done,  they  must  be 
omitted  from  the  input  file. 

3  POOLID  Identification  for  pooled  sample.  1-72  9A8 
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4  INTOVP  Minimum  value  for  theoretical  (expected)  frequency  6-10  15 

for  each  interval  in  the  chi-square  test  for  the  pooled 
sample.  Adjacent  intervals  are  combined  to  meet  this 
criterion. 

NOINTP  Number  of  chi-square  tests  to  be  done  for  the  pooled  11-15  15 

sample,  where  each  test  begins  with  a  different  num¬ 
ber  of  intervals. 

(NOINTP  <  100) 

INTERP  Number  of  intervals  for  the  first  chi-square  test  on  the  16-20  15 

(1)  pooled  sample. 

(INTERP  <  400) 

INTERP  Number  of  intervals  for  the  second  chi-square  test  on  21-25  15 

(2)  the  pooled  sample. 


INTERP  Number  of  intervals  for  the  thirteenth  chi-square  test  76-80  15 

(13)  on  the  pooled  sample. 

If  NOINTP  >  13,  additional  records  are  needed.  The  entries  should  be  made  in 

accordance  with  format  1615. 

5  NOBS  Number  of  data  points  comprising  the  first  sample.  1-5  15 

(NOBS  <  14,000) 

INTOV  Minimum  value  of  theoretical  (expected)  frequency  6-10  15 

for  each  interval  of  the  chi-square  test  for  the  first 
sample.  Adjacent  intervals  are  combined  to  meet 
this  criterion. 

NOINT  Number  of  chi-square  tests  to  be  done  for  the  first  11-15  15 

sample  where  each  test  begins  with  a  different 
number  of  intervals. 

(NOINT  <  200) 

INTER  Number  of  intervals  for  the  first  chi-square  test  on  16-20  15 

(1)  the  sample. 

INTER  Number  of  intervals  for  the  second  chi-square  test  21-25  15 

(2)  on  the  sample. 


INTER  Number  of  intervals  for  the  thirteenth  chi-square  76-80  15 

(13)  test  on  the  sample. 
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If  NOINT  >  13,  additional  records  are  needed.  The  entries  should  t,e  made  in 
accordance  with  format  1615. 

IDENT  Identification  for  the  first  sample.  1-72  9A8 

OBSERV  Data  for  the  first  sample.  Repeat  record  type  7  as  1-80  FORM 
needed  to  enter  all  data  from  first  sample. 

Record  types  5,  6,  and  7  are  repeated  in  “triples”  for  each  additional  sample. 
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BNORGOF 


PURPOSE 


Program  BNORGOF  (Bivariate  Normal  Qoodness  $f  Fit)  performs  a  chi-square 
goodness  of  fit  test  that  a  random  sample  of  data  is  from  a  bivariate  normal  parent  pop¬ 
ulation.  The  joint  density  function  for  the  bivariate  normal  distribution  is  given  by 

/(.*,, x2)  = - ]-j==e'<ia  ,  -oo  <jt,  <  °° ,  — <x2<°° , 

2710,0! Vl  -p 


where 


Q  = 


r/x,-p,v 


1-P2 


LV 


-2p 


X2  p2 


A  ( 


X2  P2 


The  parameters  (p,,o,)  and  Qi^cf)  represent  the  mean  and  variance  for  the  marginal  dis¬ 
tributions  of  x,  and  x2,  respectively.  The  parameter  p  is  the  correlation  coefficient  of  xx  and 
x2  and  measures  the  linear  dependence  between  the  variables.  The  chi-square  test  statistic 
is  calculated  under  the  assumption  that  th_  five  population  parameters,  pl5  of,  p^,  of  and  p 
are  equal  to  the  sample  statistics, 

*1  =  'Lxjn,  sx  =  I(xu  -xxf/(n  -  1), 

i  i 

x2=Z  XjJn ,  s\  =  I (x*  - x2f/{n  -  1 ), 


and  r  = 


-X\)  (X2i  -^2)]  /  [ l(xu  -x,)2 1 ifr* -x2)2]1/2( 

*  -I  Li  J 


respectively.  The  coordinates  (x,„  x2l)  represent  the  ith  data  point  and  n  is  the  number  of 
data  points.  Each  data  point  is  classified  into  one  of  a  set  of  mutually  exclusive  intervals. 
These  intervals  are  areas  between  coaxial  contour  ellipses  and  are  determined  by  arbitrarily 
chosen  probability  values  under  the  null  hypothesis  of  bivariate  normality.  Observed  and 
expected  frequencies  are  then  computed  for  these  intervals  and  the  value  of  the  chi-square 
statistic  is  obtained  in  the  usual  manner. 

A  detailed  explanation  of  this  application  of  the  chi-square  goodness  of  fit  criterion  is 
provided  in  Reference  1.  Reference  2  contains  a  description  of  the  program  under  its 
original  name  of  BI-CHI  and  the  results  from  a  numerical  example.  For  a  discussion  of  the 
bivariate  normal  distribution  the  reader  is  referred  to  Reference  3. 
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FEATURES 

The  program  allows  the  analyst  to  transform  the  data  prior  to  performing  the  chi-square 
goodness  of  fit  test  by  selecting  from  thirteen  transformations. 

Standard  output  for  each  data  sample  includes: 

*  A  listing  of  the  original  sample  data  and,  if  appropriate,  the  transformed  sample 
data. 

*  The  sample  minimum,  maximum,  mean,  variance  and  standard  deviation  for  each 
variable;  the  sample  correlation  coefficient;  and  the  sample  regression  lines  for  x, 
on  x2  and  x2  on  xv 

*  The  upper  bound,  cumulative  probability,  observed  frequency,  theoretical  (ex¬ 
pected)  frequency  and  chi-square  contribution  for  each  interval. 

*  The  value  of  the  chi-square  statistic  and  the  degrees  of  freedom. 

Optional  output  includes: 

*  The  specific  categorization  of  each  data  point  designating  within  which  elliptical 
interval  the  point  is  contained. 

*  A  plot  of  the  sample  data  which  may  also  display  probability  contours  at  the  user’s 
option. 


REFERENCES 

1 .  Bates,  C.  B.  (1966),  The  Chi-square  Test  of  Goodness  of  Fit  for  a  Bivai  iate  Normal 
Distribution,  NWL  TM  K-77/66,  NSWC,  Dahlgren,  VIRGINIA,  22448. 

2.  Bates,  C.  B.  and  Brown,  J.  (1967),  BI-CHI:  A  Computer  Program  for  the  Chi-square 
Goodness  of  Fit  Test  for  a  Bivariate  Normal  Distribution,  NWL  TM  K-72-67, 
NSWC,  Dahlgren,  VIRGINIA  22448. 

3.  Hald,  A.  (1952),  Statistical  Theory  with  Engineering  Applications,  John  Wiley  and 
Sons,  Inc.,  pp.  585  -  621. 


INPUT  GUIDE 

The  specifications  of  the  user-created  input  file  are  given  below.  An  input  guide  is 
also  given  in  Reference  2.  However,  because  of  changes  to  the  original  program,  the  input 
guide  in  Reference  2  is  no  longer  appropriate.  Record  types  1,  2,  3,  4,  5  and  9  are  man¬ 
datory.  Record  type  1  is  used  for  identification;  record  type  2  contains  the  format  for 
reading  data  points;  record  type  3  contains  the  number  of  goodness  of  fit  tests  to  be  done 
on  the  sample  data  (additional  transformations  etc.);  record  type  4  contains  program  pa¬ 
rameters;  record  type  5  contains  probability  values  for  the  goodness  of  fit  test  intervals;  and 
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record  type  9  contains  sample  data  points.  Additional  goodness  of  fit  tests  can  be  made  on 
the  original  sample  data  by  repeating  record  types  4  and  5  plus  any  desired  optional  records 
following  the  data  points  on  record  type  9.  Record  types  6,  7  and  8  are  optional.  Record 
type  6  is  used  to  denote  contour  ellipses  which  will  be  plotted;  record  type  7  contains  a 
transformation  constant  for  x{,  and  record  type  8  contains  a  transformation  constant  for  x2. 


Record 

Tvpe 

Variable 

Description 

Columns 

Format 

1 

JOB 

Problem  description 

1-72 

9A8 

2 

FMT 

Format  for  reading  the  data  terminator, 
IEND  (see  record  type  9),  and  the  sample 
data  (include  parentheses).  IEND  precedes 
the  data  on  each  record.  If  IEND  is  blank, 
at  least  one  more  data  record  will  follow. 
On  the  last  data  record  IEND  must  equal  the 
number  of  pairs  of  (:t;;t2)- values  contained 
on  the  record.  FMT  has  the  form  (12,  ). 

1-80 

10A8 

3 

NRUN 

Number  of  tests  to  be  performed  on  sample 
data.  If  NRUN  >  1,  records  4, 5  and  optional 
records  must  be  repeated  following  the  data 
(record  9)  for  each  additional  test. 

4-5 

12 

4 

NPRINT 

=0,  categorization  of  data  is  not  printed. 

=1,  categorization  is  printed. 

5 

11 

JPLOT 

=0,  no  plotting. 

=1,  data  and  contour  ellipses  are  plotted. 
=2,  data  only  is  plotted. 

10 

11 

ITRAN1 

An  integer  designating  the  transformation 
for  xu  (Refer  to  transformation  code.) 

14-15 

12 

ITRAN2 

An  integer  designating  the  transformation 
for  x2.  (Refer  to  transformation  code.) 

19-20 

12 

NPAIR 

Number  of  pairs  of  Oq^-values  per  record. 

24-25 

12 

NN 

Number  of  sets  of  probability  values,  input 
on  record  type  5,  defining  interval  bounds 
for  the  chi-square  test.  See  COMMENTS. 

(NN  <5) 

30 

11 
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NL  Number  of  sets  of  probability  values,  input  35  II 

on  record  type  6,  defining  contour  ellipses  to 
be  plotted.  See  COMMENTS. 

(NL  <  5) 

Transformation  Codes _ 

=1,  X  =  X  (no  transformation) 

=2,  X  =  In  X 
=3,  X  =  ln(ln  X) 

=4,  X  =  ln(A  +  X) 

=5,  X  =  ln(B  +  ln(C  +  X)) 

=6,  X  =  %/x 
=7,  X  =  1/X 
=8,  X  =  1/(D+X) 

=9,  X  =  arcsinX 
=10,  X  =  2arcsin>/x 
=  11,  X  =  X/E 
=12,  X  =  sinX 
=13,  X  =  cosX 


Constants  A,  B,  C,  D  and  E  are  input  on  record  type  7  for*,  and  record  type  8 


for  x2. 

PS(1) 

The  smallest  probability  value  for  the  1st  set 
of  probability  values  defining  interval 
bounds  for  the  chi-square  tests. 

1-5 

F5.3 

PDELT(l) 

The  increment  for  the  1st  set  of  probability 
values. 

6-10 

F5.3 

PE(1) 

The  largest  probability  value  for  the  1st  set 
of  probability  values. 

11-15 

F5.3 

Additional  sets  (up  to  NN  <  5,  see  record  type  4)  of  probability  values  may  be 
input  if  additional  intervals  of  unequal  probability  are  to  be  utilized  for  the  chi- 
square  test.  Each  additional  set  is  input  on  this  same  record  in  a  group  of  three 
values,  smallest  value,  increment  value  and  largest  value,  respectively,  according 
to  format  3F5. 3.  See  COMMENTS. 

Record  type  6  is  optional  and  must  be  included  if  plotting  will  be  done,  that  is,  if  JPLOT 
=  0  or  2  on  record  type  4. 
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PPS(l) 

The  smallest  probability  value  for  the  1st  set 
of  probability  values  defining  contours  to  be 
plotted. 

1-5 

F5.3 

PPDELT 

The  increment  for  the  1st  set  of  probability 

6-10 

F5.3 

(1) 

values. 

PPE(l) 

The  largest  probability  value  for  the  1st  set 
of  probability  values. 

11-15 

F5.3 

Additional  sets  (up  to  NL  <5,  see  record  type  4)  of  probability  values  may  be 
input  if  variable  probability  increments  between  contours  are  to  be  utilized  for 
the  plots.  Each  additional  set  is  input  on  this  same  record  in  a  group  of  three 
values,  smallest  value,  increment  value  and  largest  value,  respectively,  according 
to  format  3F5. 3.  See  COMMENTS. 

Record  type  7  is  optional  but  must  be  included  if  ITRAN1  >  2  on  record  type  4.  If  2  < 
ITRAN1  <  13  but  the  transformation  selected  does  not  contain  one  of  the  constants  A,  B, 
C,  D  or  E,  a  blank  record  must  be  included  as  record  type  7. 


ATRAN1 

The  A-constant  in  transformation  No.  4  for 
transforming  xx. 

1-14 

E14.8 

BTRAN1 

The  B-constant  in  -transformation  No.  5  for 
transforming 

15-28 

E14.8 

CTRAN1 

The  C-constant  in  transformation  No.  5  for 
transforming  xx. 

29-42 

E14.8 

DTRAN 1 

The  D-constant  in  transformation  No.  8  for 
transforming  xv 

43-56 

E14.8 

ETRAN1 

The  E-constant  in  transformation  No.  1 1  for 
transforming  x,. 

57-70 

E14.8 

Record  type  8  is  optional  but  must  be  included  if  ITRAN2  >  2  on  record  type  4.  If  2  < 
ITRAN2  <  13  but  the  transformation  selected  does  not  contain  one  of  the  constants  A,  B, 
C,  D  or  E,  a  blank  record  must  be  included  as  record  type  8. 


8 

ATRAN2 

The  A-constant  in  transformation  No.  4  for 
transforming  x2. 

1-14 

E14.8 

BTRAN2 

The  B-constant  in  transformation  No.  5  for 
transforming  x2. 

15-28 

E14.8 

CTRAN2 

The  C-constant  in  transformation  No.  5  for 
transforming  x2. 

29-42 

E14.8 
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DTRAN2 

The  D-constant  in  transformation  No.  8  for 
transforming  x2. 

43-56 

E14.8 

ETRAN2 

The  E-constant  in  transformation  No.  1 1  for 
transforming  x2. 

57-70 

E14.8 

IEND 

=blank,  this  is  not  the  last  data  record. 

>  0,  number  of  pairs  of  (jr„^2)-values 

on  this,  the  last  record. 

1-2 

12 

XI  (I) 

and 

X2(I) 

Input  pairs  of  (jq.jt^-values.  Data  must  be 

input  in  accordance  with  format  FMT  spe¬ 
cified  on  record  type  2.  Maximum  number 
of  pairs  is  4000. 

3-80 

Specified 

in  record 
type  2 

COMMENTS 

The  program  variables  NN  and  NL  require  additional  discussion  in  order  to  clarify 
their  usage.  NN  refers  to  the  number  of  sets  of  probability  values  to  be  input  on  record 
type  5,  defining  the  interval  bounds  of  the  chi-square  test.  NL  refers  to  the  number  of  sets 
of  probability  values  to  be  input  on  record  type  6  (optional),  defining  the  contour  ellipses 
to  be  plotted.  Both  variables  are  input  according  to  the  same  format  and  are  applied  in 
identical  fashion;  therefore,  only  NN  and  its  application  to  the  chi-square  test  is  discussed 
below.  However,  the  comments  should  be  considered  to  include  NL  and  its  application  to 
the  plotting  of  contour  ellipses. 

To  perform  a  chi-square  test  with  equal  probability  intervals,  the  user  should  set  NN=1. 
Using  as  an  example,  a  test  with  .05  probability  in  each  interval,  then  .05  would  be  input 
as  the  smallest  probability  bound,  .05  would  be  input  as  the  increment  value  and  .95  would 
be  input  as  the  largest  probability  bound.  This  would  result  in  a  chi-square  test  with  20 
intervals,  each  containing  a  probability  of  .05.  For  this  case  the  user  should  choose  an 
interval  probability  value  which  divides  one  (1)  evenly  to  obtain  an  integral  number  of 
intervals. 

If  the  user  wishes  to  vary  the  probability  within  intervals,  this  can  be  accomplished  by 
setting  NN  >  1  (but  <  5)  and  inputting  the  appropriate  probability  values  in  record  type  5. 
This  is  best  illustrated  with  another  example.  Consider  a  case  where  the  user  wanted  in¬ 
tervals  of  .05  probability  from  0  to  .20,  intervals  of  .10  probability  from  .20  to  .80  and 
intervals  of  .05  probability  from  .80  to  1.  NN  must  be  set  equal  to  3  in  record  type  4.  On 
record  type  5,  three  sets  of  three  probability  values  each  would  be  input.  The  first  set  would 
be  .05,  .05  and  .20.  The  second  set  would  be  .30,  .10  and  .80.  The  third  set  would  be  .85, 
.05  and  .95. 


80 


NSWC  TR  89-97 


EXPGOF 


PURPOSE 

Program  EXPGOF  (Exponential  Goodness  Qf  Fit)  performs  the  chi-square  goodness  of 
fit  test  for  the  exponential  probability  density  function.  The  form  chosen  for  the  expo¬ 
nential  density  is 

f{x)  =  ^e~x'*  ,  x>0 

with  cumulative  distribution  function 

F(x)  =  Prob(X  <x)=\-e-x'*  ,  x>0. 

Parameter  P  is  assumed  to  be  unknown  and  is  estimated  with  the  sample  average.  This 
estimate  is  used  in  the  exponential  cumulative  distribution  function  to  construct  cell 
boundaries  which  provide  equal  cell  frequency  and  hence,  equal  expectation  under  the 
exponential  hypothesis. 

Further  information  regarding  the  exponential  distribution  is  contained  in  References 
1  and  2. 


FEATURES 

EXPGOF  features  include  the  following: 

*  A  listing  of  the  input  data. 

*  A  frequency  histogram  showing  the  observed  frequency  in  each  cell  and  the  chi- 
square  contribution  for  each  cell.  Since  the  expected  cell  frequencies  are  equal 
under  the  exponential  hypothesis,  this  histogram  will  appear  uniform  if  the  data 
supports  the  hypothesis. 

*  A  display  histogram  based  on  equidistant  cell  boundaries.  This  histogram  depicts 
the  shape  of  the  observed  frequency  distribution. 

*  The  value  of  the  chi-square  test  statistic  and  the  associated  degrees  of  freedom. 

*  The  capability  to  perform  multiple  analyses  of  the  same  data  set  in  one  computer 
run. 
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REFERENCES 

1.  Johnson,  Norman  L.  and  Kotz,  Samuel  (1970),  Continuous  Univariate  Distributions 
-  7,  Houghton  Mifflin  Company,  pp.  207  -  232. 

2.  Lindgren,  B.  W.  (1968),  Statistical  Theory,  Second  Edition,  The  Macmillan  Com¬ 
pany,  pp.  33  -  38. 


INPUT  GUIDE 

The  user-created  input  file  consists  of  four  record  types  as  described  below. 


Record 


Tvne 

Variable 

Description 

Columns 

Format 

1 

ID 

Problem  description. 

1-72 

9A8 

2 

FORM 

Variable  format  (in  parentheses)  for  reading 
the  sample  data. 

6-45 

4A10 

NTOT 

Total  number  of  sample  data  points. 

51-60 

110 

Record  3  is  used  to  input  the  sample  data  points.  The  points  are  entered  according  to  format 
FORM  given  on  Record  2  in  columns  6-45. 


4 

NTEST 

The  number  of  the  test. 

1-10 

110 

PSUBI 

Subinterval  probability  (equal  for  each  in¬ 
terval). 

11-20 

F10.5 

NINT2 

Number  of  equidistant  intervals  to  be  used  in 
the  display  histogram. 

21-30 

110 

Record  4  is  repeated  for  each  test  to  be  performed  on  the  data. 
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WBLGOF 


PURPOSE 

Program  WBLGOF  (Weibull  Goodness  Fit)  performs  the  chi-square  goodness  of  fit 
test  for  the  two-parameter  Weibull  probability  density  function.  The  form  chosen  for  the 
Weibull  density  is 


with  cumulative  distribution  function 

F(x)  =  Prob(X  <x)=\-e<Il6f  ,  x>0. 

Parameters  P  and  8  are  both  assumed  to  be  unknown  and,  therefore,  must  be  estimated 
from  the  data.  The  method  of  maximum  likelihood  estimation  is  used  for  this  purpose  and 
provides  the  following  two  equations. 

(rt/P)-«ln5+  Zlnjq. --^EfafOnjq.-lnS)}  =0 

i  &  i 

tf  =  -lx? 

■  ni 

where  n  is  the  number  of  observations  in  the  sample  data,  xt  is  the  tth  observation,  and  In 
denotes  the  natural  logarithm.  These  equations  are  solved  numerically  for  p  and  8,  and  the 
results  designated  as  P  and  8,  respectively.  The  estimates  are  used  in  the  Weibull  cumula¬ 
tive  distribution  function  to  construct  cell  boundaries  which  provide  equal  cell  frequency 
and  hence,  equal  expectation  under  the  Weibull  hypothesis. 

Further  information  regarding  the  Weibull  distribution,  its  origin,  and  applications  are 
contained  in  References  1,  2,  and  3. 


FEATURES 

WBLGOF  features  include  the  following: 

*  A  listing  of  the  input  data. 

*  A  frequency  histogram  showing  the  observed  frequency  in  each  cell  and  the  chi- 
square  contribution  for  each  cell.  Since  the  expected  cell  frequencies  are  equal 
under  the  Weibull  hypothesis,  this  histogram  will  appear  uniform  if  the  data  sup¬ 
ports  the  hypothesis. 
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*  A  display  histogram  based  on  equidistant  cell  boundaries.  This  histogram  depicts 
the  shape  of  the  observed  frequency  distribution. 

*  Estimates  of  the  Weibull  parameters  and  their  variances. 

*  The  value  of  the  chi-square  test  statistic  and  the  associated  degrees  of  freedom. 

*  The  capability  to  perform  multiple  analyses  of  the  same  data  set  in  one  computer 
run. 


REFERENCES 

1.  Johnson,  Norman  L.  and  Kotz,  Samuel  (1970),  Continuous  Univariate  Distributions 
•  1,  Houghton  Mifflin  Company,  pp.  250  -  271. 

2.  Plait,  Alan  (1962),  “The  Weibull  Distribution  -  With  Tables”,  Industrial  Quality 
Control,  November,  1962. 

3.  Weibull,  Waloddi  (1951),  “A  Statistical  Distribution  Function  of  Wide  Applicability”, 
Journal  of  Applied  Science,  September,  1951. 


INPUT  GUIDE 

The  user-created  input  file  consists  of  four  record  types  as  described  below. 


Record 


Type 

Variable 

Description 

Columns 

Format 

1 

ID 

Problem  description. 

1-72 

9A8 

2 

FORM 

Variable  format  (in  parentheses)  for  reading 
the  sample  data. 

6-45 

4A10 

NTOT 

Total  number  of  sample  data  points. 

51-60 

110 

Record  3  is  used  to  input  the  sample  data  points.  The  points  are  entered  according  to  format 
FORM  given  on  Record  2  in  columns  6-45. 


NTEST 

The  number  of  the  test. 

1-10 

no 

PSUBI 

Subinterval  probability  (equal  for  each  in¬ 
terval). 

11-20 

F10.5 

BTOL 

Tolerance  used  in  the  numerical  solution  of 
the  maximum  likelihood  equations. 

21-30 

F10.5 

NINT2 

Number  of  equidistant  intervals  to  be  used  in 
the  display  histogram. 

31-40 

110 
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STPSIZ  Stepsize  (=0.0  to  print  the  display  histogram  41-50  F10.5 

over  the  entire  range  of  data). 

Record  4  is  repeated  for  each  test  to  be  performed  on  the  data. 


COMMENTS 

The  input  variables  (1)  BTOL  and  (2)  STPSIZ  on  record  4  require  some  elaboration. 

(1)  The  Newton-Raphson  procedure  is  used  to  numerically  solve  the  likelihood  equations. 
This  procedure  requires  a  user  supplied  tolerance  to  compare  with  the  difference  in  the 
recursive  solutions.  This  tolerance  is  input  as  BTOL  on  record  4.  A  value  of  .001  is 
probably  adequate  for  most  goodness  of  fits.  However,  if  one  is  using  WBLGOF  as  a 
means  of  numerically  extracting  maximum  likelihood  estimates  for  the  Weibull  distribu¬ 
tion,  he  would  want  to  input  a  substantially  lower  tolerance. 

(2)  The  display  histogram  is  controlled  by  input  variables  NINT2  and  STPSIZ  on  record 
4.  If  NINT2=10  and  STPSIZ=0.0,  for  example,  the  data  range  is  divided  into  10  cells  with 
equidistant  boundaries.  In  dealing  with  Weibull  data,  one  oftentimes  has  several  observa¬ 
tions  which  are  substantially  larger  than  the  remaining  body  of  data.  The  largest  of  these 
values  determines  the  range  and,  hence,  the  cell  boundary  distance.  If  this  value  is  large 
enough,  it  can  force  the  remaining  body  of  data  into  one  or  two  cell  which  yields  a  poor 
display  histogram.  If  this  happens  on  the  first  pass,  one  can  control  it  on  subsequent  passes 
by  setting  STPSIZ  to  an  appropriate  value  for  the  distance  between  cell  boundaries.  This 
will  truncate  data  points  with  extreme  values  to  provide  a  better  display  of  the  bulk  of  the 
data. 
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PERGOF 


PURPOSE 

Program  PERGOF  (Pearson  System  Qoodness  $f  Fit)  determines  which  distribution, 
if  any,  from  the  Pearson  system  of  frequency  curves  best  fits  a  set  of  data.  Any  distribution 
which  is  determined  by  its  mean  (|i)  and  its  second,  third,  and  fourth  central  moments 
(l^,  (J-3,  (i4)  is  a  member  of  the  Pearson  family  of  distributions.  \i^  represents  the  distribution 

variance.  (i3  is  related  to  the  degree  of  skewness  (lack  of  symmetry  about  the  mean)  of  the 
distribution  while  p4  is  related  to  the  amount  of  kurtosis  (peakedness)  that  the  distribution 

exhibits.  This  family  contains  distributions  which  are  bell-shaped,  J-shaped,  L-shaped,  and 
U-shaped.  Within  these  four  general  shapes  is  a  continuum  of  skewed,  flattened,  and 
peaked  curves.  More  importantly,  almost  any  set  of  data  can  be  fit  to  a  Pearson  distribution 
by  merely  equating  the  moments  ji,  ji^,  1I3,  and  fi4.  Measures  of  skewness  and  kurtosis  are 
given  by  the  following  expressions 

P,  (skewness)  =  ^ 

P2  (kurtosis)  =  (i4  /|4 

For  purposes  of  reference  we  point  out  that  the  values  of  skewness  and  kurtosis  for  the 
widely-used  normal  distribution  (symmetric  and  bell-shaped)  are  j3x  =  0  and  P2  =  3,  re¬ 
spectively.  The  Pearson  system  provides  the  analyst  with  an  excellent  source  of 
distributions  which  depart  from  normality  in  varying  degrees  of  skewness  and  kurtosis. 
Another  important  quantity,  used  in  the  classification  of  the  distribution  types  in  the  Pear¬ 
son  system,  is  the  kappa  “criterion.”  This  quantity  is  a  function  of  and  (32  and  is  given 
by  the  expression 

K(appa)  4(2(J2  -  3  (J,  -  6)  (4P2  -  3  (i.) 

PERGOF  can  fit  data  to  nine  of  the  distribution  types  that  are  contained  in  the  Pearson 
system.  Three  of  these  (Types  I,  IV,  and  VI)  are  called  main  types  while  the  remaining  six 
(Types  II,  ID,  V,  VII,  X,  and  XIII)  are  referred  to  as  transition  types.  We  note  that  Type 
XIII  is  the  well-known  normal  distribution.  The  user  can  choose  to  force  fit  the  data  to  any 
of  these  nine  types  or  let  the  program  determine  the  best  fitting  distribution  from  among 
the  three  main  types.  If  the  latter  option  is  selected,  the  choice  is  based  on  the  value  of 
the  kappa  criterion.  (If  K  <  0  Type  I  is  the  indicated  choice;  if  0  <  K  <  1  Type  IV  is 
indicated;  and  if  K  >  1  Type  VI  is  selected.) 
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For  further  details  regarding  the  Pearson  system  the  user  should  consult  References  1 
and  2. 


FEATURES 

The  quality  of  the  fit  of  the  input  data  to  one  of  the  Pearson  type  distributions  is  as¬ 
sessed  by  means  of  the  chi-square  goodness  of  fit  test  of  hypothesis.  Two  options  are 
available  to  the  user  for  computing  the  chi-square  statistic:  (1)  equal  probability  intervals, 
or  (2)  equal  length  intervals.  The  user  must  consult  a  table  of  the  chi-square  distribution 
in  order  to  interpret  the  value  of  this  statistic.  The  results  of  the  chi-square  test  may  depend 
upon  which  interval  option  is  chosen. 

PERGOF  output  features  include  the  following: 

*  Input  specifications  and  the  range  of  the  input  data. 

*  Estimates  of  the  mean,  variance,  fi3,  and  (i4. 

*  Estimates  of  p,,  p2,  and  K. 

*  A  table  of  chi-square  test  results  for  the  chosen  interval  option.  This  table  displays 
for  each  interval  the  theoretical  and  observed  frequencies,  the  chi-square  contri¬ 
bution,  and  the  right-hand  interval  boundary  in  both  original  and  standardized 
units.  The  standardized  boundary  is  computed  as  the  original  boundary  value 
minus  the  sample  mean. 

*  The  value  of  the  chi-square  statistic  and  a  statement  identifying  the  Pearson  dis¬ 
tribution  type  to  which  the  fit  was  made. 

*  A  graph  of  the  chosen  Pearson  distribution  type  (asterisks)  superimposed  on  a 
graph  of  the  input  data.  Location  of  the  data  is  indicated  graphically  by  horizontal 
dashed  lines  bounded  by  “+”  signs.  The  functional  form  of  the  chosen  distribution 
is  displayed  below  the  graph. 

PERGOF  allows  the  user  to  request  up  to  ten  fits  to  the  input  data,  i.e.,  nine  force  fits 
(one  to  each  of  the  nine  available  distribution  types)  and  one  fit  in  which  the  program 
selects  the  best  main  type.  In  the  event  that  more  than  one  fit  is  requested,  results  for  all 
the  equal  probability  interval  cases  are  printed  first,  followed  by  results  for  the  equal  length 
interval  cases. 

PERGOF  also  has  a  provision  for  pooling  individual  contributions  to  the  chi-square 
statistic  when  the  equal  length  interval  option  is  chosen.  The  user  specifies  a  minimum 
expected  (theoretical)  frequency.  If  the  expected  frequency  in  any  interval  is  less  than  this 
value,  adjacent  intervals  are  combined  until  this  condition  no  longer  exists.  Interpretation 
of  the  chi-square  statistic  must  then  be  made  with  respect  to  the  new  number  of  intervals. 
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INPUT  GUIDE 

The  user  must  create  a  data  file  containing  the  following  record  types.  The  first  record 
type  specifies  the  number  of  observations  in  the  data  set  to  be  fit,  the  number  of  equal 
probability  and  equal  length  interval  cases  to  be  processed  for  the  chi-square  statistic,  the 
number  of  distributional  fits  to  be  performed,  the  minimum  expected  frequency  for  pooling, 
the  maximum  number  of  iterations  to  be  used  in  the  numerical  integration  procedure  for 
determining  the  expected  frequencies,  and  a  flag  value  indicating  whether  or  not  a  problem 
identification  is  to  be  supplied.  The  second  record  type  specifies  the  format  under  which 
the  input  data  is  to  be  read.  The  third  record  type  is  included  only  if  the  equal  probability 
interval  option  is  to  be  exercised.  This  record  specifies  up  to  28  different  probability  values 
for  equal  probability  intervals.  The  fourth  record  type  is  included  only  if  the  equal  length 
interval  option  is  to  be  exercised.  This  record  specifies  up  to  28  different  numbers  of 
intervals  for  equal  length  intervals.  Record  type  five  is  included  only  if  the  number  of 
distributional  fits  to  the  input  data  has  been  specified.  (If  not,  PERGOF  will  select  the  best 
fitting  Pearson  main  type.)  This  record  contains  the  numbers  of  the  Pearson  distribution 
types  to  be  fitted.  Here  the  value  0  requests  the  best  fitting  main  lype.  Up  to  10  values 
may  be  specified.  Record  type  six  is  included  only  if  the  user  has  elected  to  supply  a 
problem  identification.  This  record  provides  that  identification.  Record  type  seven  con¬ 
tains  the  input  data  according  to  the  format  provided  by  the  user. 

Input  specifics  are  provided  in  the  input  guide  below: 

Record 

Type  Variable  _ Description _  Columns  Format 

1  NPERCD  Total  number  of  observations  in  the  input  1-10  110 

data  set. 


(NPERCD  <  25000) 


NPCHI 

NKCHI 

NDIST 

POOL 

MAXINT 

IDFLAG 

2  FMT 

3  PCHI(l) 
PCHI(2) 
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Number  of  cases  to  be  processed  with  the  11-20 
equal  probability  interval  option.  The 
probability  values  are  specified  in  record 
type  3. 

(NPCHI  <  28) 

Number  of  cases  to  be  processed  with  the  21-30 
equal  length  interval  option.  The  values  for 
the  numbers  of  intervals  are  specified  in  re¬ 
cord  type  4. 

(NKCHI  <  28) 

Number  of  Pearson  distribution  fits  to  be  31-40 
made  to  the  input  data. 

(NDIST  <  10) 

Minimum  expected  (theoretical)  frequency  41-50 

used  as  the  criterion  for  pooling  individual 
interval  contributions  to  the  overall  chi- 
square  statistic. 

Maximum  number  of  iterations  (DEFAULT  51-60 

=  100)  to  be  used  in  the  numerical  integra¬ 
tion  scheme  for  determining  the  expected 
(theoretical)  frequencies. 

(1  <  MAXINT  <100) 

Problem  identification  option.  61-70 

>0,  identification  will  be  supplied  in  re¬ 
cord  type  6. 

=0  or  blank,  no  identification  will  be 
supplied. 

Format  by  which  the  input  data  is  to  be  read.  1-72 


no 


no 


no 


El  0.4 


no 


no 


9A8 


Record  type  3  is  included  only  if  NPCHI  >  0. 


1st  probability  value  to  be  processed  with  1-10 
the  equal  probability  interval  option. 

2nd  probability  value  to  be  processed  with  11-20 
the  equal  probability  interval  option. 


El  0.4 


E10.4 
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PCHI(7)  7  th  probability  value  to  be  processed  with  61-70  El 0.4 

the  equal  probability  interval  option. 

Since  NPCHI  <  28,  up  to  four  record  type  3’s  may  be  required,  each  adhering 
to  the  above  format. 


Record  type  4  is  included  only  in  NKCHI  >  0. 


4  KCHI(l)  1st  value  for  the  number  of  equally  spaced  1-5  15 

intervals  to  be  processed  with  the  equal 
length  interval  option. 

KCHI(2)  2nd  value  for  the  number  of  equally  spaced  6-10  15 

intervals  to  be  processed  with  the  equal 
length  interval  option. 


KCHI(14)  14th  value  for  the  number  of  equally  spaced  66-70  15 

intervals  to  be  processed  with  the  equal 
length  interval  option. 


Since  NKCHI  <  28,  up  to  two  record  type  3’s  may  be  required,  each  adhering  to 
the  above  format. 

Record  type  5  is  included  only  if  NDIST  >  0. 

5  MDIST(l)  1st  Pearson  distribution  fit  to  the  input  data.  1-5  15 

=0,  select  the  Pearson  main  type  which 
provides  the  best  fit. 

=1,  2,  3,  4,  5,  6,  7,  10,  or  13 

force  fits  to  the  specified  Pearson 
distribution  type  (where  distribution 
type  Roman  numerals  have  been  re¬ 
placed  by  their  Arabic  equivalents). 


MDIST(2) 

2nd  Pearson  distribution  fit  to  the  input  data. 

6-10 

15 

MDIST(IO) 

10th  Pearson  distribution  fit  to  the  input 

46-50 

15 

data. 
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Record  type  6  is  included  only  if  IDFLAG  ;>  0. 

6  OBSID  Problem  identification.  1-72  9A8 

Record  type  7  contains  the  input  data  according  to  the  format  specified  by  FMT  on  record 
type  2. 


COMMENTS 

Simpson’s  quadrature  formula  is  the  numerical  integration  technique  employed  in 
PERGOF  to  find  required  areas  associated  with  the  Pearson  distributions.  Since  the  Pear¬ 
son  system  contains  such  a  wide  variety  of  shapes  and  Simpson’s  rule  is  a  generalized 
method,  it  follows  that  the  numerical  integration  scheme  will  not  perform  adequately  for 
all  possible  sets  of  input  data  Furthermore,  for  a  given  set  of  data,  it  is  possible  that  the 
integration  scheme  will  be  adequate  for  one  of  the  chi-square  interval  options  but  not  the 
other.  This  means  that  if  PERGOF  fails  to  handle  the  user’s  input  data  for  one  choice  of 
the  interval  option,  the  other  option  should  be  tried  before  abandoning  PERGOF  as  an 
analysis  tool.  Should  the  numerical  integration  scheme  in  PERGOF  fail  to  handle  a  given 
data  set  an  appropriate  message  is  printed  for  the  user. 

While  PERGOF  allows  the  user  to  fit  a  set  of  data  to  one  of  nine  Pearson  distribution 
types,  subroutine  RANPDI  is  available  in  STATLEB  to  permit  the  user  to  generate  random 
numbers  from  one  of  these  types. 
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UNKSGOF 


PURPOSE 


Program  UNKSGOF  (Univariate  Mormal  Kolmogorov-Smimov  Qoodness  Qf  Fit) 
performs  a  Kolmogorov-Smimov  (K-S)  test  of  hypothesis  which  assesses  the  agreement  of 
a  sample  cumulative  distribution  function  with  that  of  the  cumulative  distribution  function 
(cdf)  of  the  normal  distribution.  Consider  a  random  variable  X  with  a  continuous  cdf,  F(x). 
Given  it  random  sample  of  n  observations,  the  sample  cdf,  Fn(x),  is  defined  by 

Fn(x{i))  =  iln 


where  x(l)  is  the  /th  smallest  value  in  the  sample.  The  cdf  of  the  normal  distribution  is  given 
by 


F(x)  =  Prob(X 


<x)  = 


where 


U-tQ2 
2 o2 


— oo  <  X  <  00 


and  ji  and  a2  are  the  mean  and  variance,  respectively  of  the  distribution.  These  parameters 
are  assumed  to  be  unknown  and  unspecified  in  the  normality  hypothesis,  and  hence,  are 
estimated  from  the  data  by  the  sample  statistics 

— 1"  ,  1  " 

x  =  —  Z  x,  and  s 2  = - -  £  (x,  -x)2 . 

These  estimates  are  then  substituted  for  p  and  a2  in  f{x)  above  in  order  to  evaluate  F0(x), 
the  cdf  under  the  normal  hypothesis.  The  K-S  test  statistic  is  based  on  the  maximum  ab¬ 
solute  deviation  of  Fn(x{i))  from  F0{x(l))  and  is  given  by 

Dn  =  max  |  Fn(x{l})  -  /r0(x(l))| . 

i 

The  percentiles  of  the  distribution  of  Dn  have  been  approximated  by  Monte  Carlo  sampling 
(Reference  3).  A  table  of  critical  values  at  the  a  level  of  significance  appears  in  this  ref¬ 
erence  and  is  included  in  the  “COMMENTS”  section  of  UNKSGOF  to  enable  the  user  to 
conduct  the  required  test  of  hypothesis.  The  hypothesis  that  a  random  sample  of  n  obser¬ 
vations  comes  from  an  underlying  normal  distribution  is  tested  by  comparing  Dn  with  the 
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critical  value,  Dna,  from  the  table,  a  is  the  probability  of  falsely  rejecting  the  hypothesis. 
If  Dn  >  Dn  a,  reject  the  hypothesis  at  the  level  of  significance;  otherwise,  do  not  reject  the 
hypothesis. 

Unlike  the  chi-square  test,  the  K-S  test  requires  no  grouping  of  the  data  and  it  is  ap¬ 
plicable  to  very  small  samples  (n  >  4).  Further  discussion  of  the  K-S  test  can  be  found  in 
References  1,  2,  and  3. 


FEATURES 

UNKSGOF  allows  the  user  to  transform  the  sample  data  prior  to  performing  the  K-S 
test  for  normality  by  specifying  one  of  thirteen  available  transformations. 

UNKSGOF  output  features  include  the  following: 

*  Input  specifications  and  the  original  and  transformed  data. 

*  Sample  statistics:  minimum  and  maximum  values,  range,  sample  size,  mean, 
variance,  and  standard  deviation. 

*  A  table  of  the  K-S  test  computations  for  each  sample  value.  Included  in  the  table 
is  a  listing  of  the  transformed  data  values  in  ascending  order,  their  standardized 
values  (i.e.,  subtract  sample  mean  and  divide  by  sample  standard  deviation),  and 
the  corresponding  theoretical  and  sample  cdf  values  along  with  their  absolute 
differences. 

*  The  value  of  the  K-S  test  statistic  together  with  the  number  of  the  observation  in 
the  ordered  sample  which  yielded  this  maximum  absolute  deviation. 

UNKSGOF  also  allows  the  user  to  analyze  multiple  sets  of  data  in  a  single  computer 

run. 


REFERENCES 

1.  Bates,  Carl  B.  and  Orsulak,  Jacqueline  R.  (1968),  A  Computer  Program  for  the 
Kolmogorov  Goodness  of  Fit  Test  for  Normality,  NWL  Technical  Memorandum 
K-2/68,  NSWC,  Dahlgren,  VIRGINIA  22448. 

2.  Bowker,  Albert  H.  and  Lieberman,  Gerald  J.  (1972),  Engineering  Statistics,  Second 
Edition,  Prentice-Hall,  Inc.,  pp.  454  -  458. 

3.  Lilliefors,  Hubert  W.  (1967),  “On  the  Kolmogorov-Smimov  Test  of  Normality  With 
Mean  and  Variance  Unknown”,  Journal  of  the  American  Statistical  Association,  Vol. 
62,  pp.  399  -  402. 
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INPUT  GUIDE 


Some  changes  have  been  made  to  the  original  program  since  the  date  of  Reference  1. 
For  this  reason  the  input  guide  specified  below  should  take  precedence  over  the  one  given 
in  Reference  1 .  The  user  must  create  a  data  file  containing  the  following  record  types.  The 
first  record  type  specifies  the  problem  description  and  the  second  specifies  the  format  under 
which  the  input  data  is  to  be  read.  The  third  record  type  specifies  the  number  and  identity 
of  the  transformations  to  be  performed  on  the  input  data  as  well  as  the  number  of  data 
values  per  record.  The  fourth  record  type  is  included  only  if  transformations  4,  5,  8,  or  11 
are  selected  by  the  user.  It  provides  the  values  of  constants  required  in  specifying  these 
transformations.  Record  type  five  is  used  to  input  the  sample  data.  If  more  than  one  set 
of  sample  data  is  to  be  subjected  to  the  K-S  test  for  normality  in  the  same  computer  run, 
record  types  1  through  5  must  be  repeated  for  each  such  set. 


Input  specifics  are 

Record 

TYDe  Variable 

provided  in  the  input  guide  below: 

Description 

Columns 

1 

JOB 

Problem  description. 

1-72 

2 

FMT 

Format  used  to  read  in  the  data  terminator, 
ITEST  (see  record  5),  and  the  sample  data 
(include  parentheses).  ITEST  precedes  the 
data  on  each  record.  If  ITEST  is  blank,  at 
least  one  data  value  follows  on  that  record. 
On  the  last  record  ITEST  must  equal  the 
number  of  observations  on  that  record.  FMT 
has  the  form  (12.  ). 

1-80 

3 

NRUN 

Number  of  transformations  to  be  performed 
on  the  sample  data. 

(1  <  NRUN  <  13) 

1-2 

ITRAN 

(1) 

=0,  do  not  perform  transformation  num¬ 
ber  1. 

=1,  perform  transformation  number  1. 
(Refer  to  the  list  of  transformations 
given  below.) 

11 

ITRAN 

(2) 

=0,  do  not  perform  transformation  num¬ 
ber  2. 

12 

=  1,  perform  transformation  number  2. 

Format 

9A8 

8A10 


12 


II 


12 
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ITRAN 

=0, 

do  not  perform  transformation  num¬ 

23 

11 

(13) 

ber  13. 

=1, 

perform  transformation  number  13. 

NBR 

Number  of  data  values  per  record. 

26-27 

12 

NC 

=0, 

record  type  4  is  not  included. 

30 

11 

=1, 

record  type  4  is  included. 

Transformation  Number 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 
13 


Transformation  Codes 

X  =  X  (no  transformation) 
X  =  \nX 
X  =  ln(ln  X) 

X  =  ln(A  +  X) 

X  =  ln(B  +  ln(C  +  X)) 

X  =  V* 

X=  l/X 
X  =  1/(D+X) 

X  =  arcsinX 
X  =  2arcsin^X 
X=X/E 
X  =  sinX 
X  =  cosX 


Record  type  4  is  included  only  if  trasformations  4,  5,  8,  or  1 1  have  been  requested  by  the 
user. 


4  A 

Constant  in  transformation  number  4. 

1-14 

F14.6 

B 

Constant  in  transformation  number  5. 

15-28 

F14.6 

C 

Constant  in  transformation  number  5. 

29-42 

F14.6 

D 

Constant  in  transformation  number  8. 

43-56 

F14.6 

E 

Constant  in  transformation  number  11. 

57-70 

F14.6 

Although  UNKSGOF 
following  range: 

does  not  require  that  the  sample  size  ( n )  be  input,  it  must 

fall  in  the 

4  <  n  <  2500 

5  ITEST 

=blank,  this  is  not  the  last  record  containing 
sample  observations. 

1-2 

12 

>0,  number  of  sample  observations  on 

this,  the  last  record. 
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XX  Array  containing  sample  input  data.  Data  3-80  Specified 
must  be  input  in  accordance  with  format  in  record 

FMT  specified  in  record  type  2.  type  2 

COMMENTS 

In  the  output  of  UNKSGOF,  results  of  the  K-S  test  for  normality  include  two  columns 
of  absolute  values  labelled  (1)  F0(J )  -  Fn(j  -  1)  and  (2)  F0(J)  --  Fn(J).  The  reason  that  there 

are  two  rhther  than  one  such  columns  is  that  the  K-S  test  is  a  two-sided  test  of  hypothesis. 
The  rejection  region  for  the  test  is  in  two  parts,  a  lower  rejection  region  and  an  upper 
rejection  region.  For  each  data  value  a  value  of  Dn  is  computed  for  both  regions.  Dn  takes 
the  value  given  by  (1)  above  for  the  lower  region  and  (2)  for  the  upper  region.  The  K-S 
test  statistic  is  the  maximum  of  these  two  values  taken  over  all  the  data  values.  The  in¬ 
terested  user  can  consult  Reference  1  for  complete  details. 

To  enable  the  user  to  interpret  the  value  of  the  K-S  test  statistic,  D„,  computed  by 
UNKSGOF,  a  table  of  critical  values  is  required.  Such  a  table  is  provided  below  and  is 
taken  from  Reference  3.  Let  Dn  denote  the  required  critical  value,  where  n  is  the  sample 
size  and  a  is  the  level  of  significance  of  the  hypothesis  test.  If  D„  equals  or  exceeds  the 
tabled  value  for  Dn  a,  the  hypothesis  of  normality  is  rejected.  Otherwise  we  fail  to  reject 
the  hypothesis. 
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D„  a  CRITICAL  VALLES  FOR  THE  K-S  TEST  FOR  NORMALITY1 


a :  Level  of  Significance 


Sample 

size 

n 

0.20 

0.15 

0.10 

0.05 

0.01 

4 

.300 

.319 

.352 

.381 

.417 

5 

.285 

.299 

.315 

.337 

.405 

6 

.265 

.277 

.294 

.319 

.364 

7 

.247 

.258 

.276 

.300 

.348 

8 

.233 

.244 

.261 

.285 

.331 

9 

.223 

.233 

.249 

.271 

.311 

10 

.215 

.224 

.239 

.258 

.294 

11 

.206 

.217 

.230 

.249 

.284 

12 

.199 

.212 

.223 

.242 

.275 

13 

.190 

.202 

.214 

.234 

.268 

14 

.183 

.194 

.207 

.227 

.261 

15 

.177 

.187 

.201 

.220 

.257 

16 

.173 

.182 

.195 

.213 

.250 

17 

.169 

.177 

.189 

.206 

.245 

18 

.166 

.173 

.184 

.200 

.239 

19 

.163 

.169 

.179 

.195 

.235 

20 

.160 

.166 

.174 

.190 

.231 

25 

.142 

.147 

.158 

.173 

.200 

30 

.131 

.136 

.144 

.161 

.187 

Over  30 

.736 

v* 

.768 

.805 

.886 

1.031 

IT 

1  Reprinted  by  permission  from  “On  the  Kolmogorov-Smimov  Test  of  Normality  With 
Mean  and  Variance  Unknown”  by  Lilliefors,  Hubert  W.,  Journal  of  the  American  Statis¬ 
tical  Association,  Vol.  62  (1967),  pp.  399-402. 
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RANDOM 


PURPOSE 

Program  RANDOM  evaluates  the  usefulness  of  candidate  pseudo-uniform  random 
number  generators  which  have  been  designed  for  use  on  any  computer  system.  The  can¬ 
didate  generator  must  produce  random  variates  which  purportedly  come  from  a  continuous 
uniform  distribution  over  the  interval  0  to  1.  Program  RANDOM  subjects  a  sequence  of 
pseudo-uniform  numbers  from  the  candidate  generator  to  a  collection  of  statistical  “tests  of 
randomness.”  These  tests  are  designed  to  expose  departures  from  the  assumptions  of 
independence  and  uniformity  for  the  generated  sequence  of  random  variates.  The  results 
of  these  tests  can  be  used  to  judge  the  adequacy  of  the  candidate  generator. 


FEATURES 

Program  RANDOM  performs  1 1  different  statistical  tests  of  hypothesis  on  a  single 
sequence  of  10000  pseudo-uniform  random  numbers  produced  by  the  candidate  generator. 
Each  test  is  conducted  at  the  five  percent  level  of  significance  and  the  decision  “FAIL  TO 
REJECT”  or  “REJECT”  the  appropriate  hypothesis  appears  as  part  of  the  output  for  each 
test.  The  tests  performed  are: 

1.  Mean  and  Variance  tests 

2.  Frequency  test 

3.  Kolmogorov-Smimov  (K-S)  test 

4.  Maximum  of  t  test 

5.  Gap  test 

6.  Poker  test 

7.  Coupon  collector’s  test 

8.  Permutation  test 

9.  Runs  test 

10.  Serial  test  for  successive  pairs 

1 1.  Serial  correlation  test 

A  frequency  table  and  a  graph  of  the  sample  cumulative  distribution  function  are  dis¬ 
played  in  conjunction  with  the  output  fn~m  the  frequency  test. 
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While  program  RANDOM  helps  to  identify  bad  generators,  we  caution  that  a  candidate 
generator  should  not  necessarily  be  discarded  just  because  it  fails  one  or  two  of  the  tests 
for  randomness  since,  in  our  case,  each  statistical  test  permits  failure  for  a  good  generator 
five  percent  of  the  time.  If  one  or  more  tests  fail,  it  is  recommended  that  program  RAN¬ 
DOM  be  rerun  one  or  more  times  using  different  input  sequences  of  size  10000,  each 
generated  by  a  different  starting  seed  value,  in  order  to  obtain  a  better  feel  for  the  candidate 
generator’s  reliability. 

For  a  detailed  discussion  of  the  above  tests  as  well  as  the  interpretation  of  their  results, 
consult  Reference  1. 
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INPUT  GUIDE 

Some  changes  have  been  made  to  the  original  program  since  the  date  of  Reference  1. 
For  this  reason  the  input  guide  specified  below  should  take  precedence  over  the  one  given 
in  Reference  1. 

The  user  must  first  generate  a  sequence  of  exactly  10000  random  variates  from  the 
candidate  generator.  The  required  data  file  then  consists  of  one  record  type  repeated  10000 
times,  each  record  containing  one  of  the  generated  random  variates.  More  specifically,  the 
data  file  is  constructed  as  follows: 


Record 

Type 

Variable 

Description 

Columns 

Format 

1 

ITEM(l) 

1st  random  variate 

1-22 

E22.14 

ITEM(2) 

2nd  random  variate 

1-22 

E22.14 

ITEM 

(10000) 

10000th  random  variate 

1-22 

E22.14 

100 
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Any  floating  point  number  placed  in  the  first  22  columns  of  a  record  will  satisfy  the  above 
format  specification  for  each  variate  value.  The  user  is  cautioned,  however,  that  if  numbers 
in  E  format  are  included  in  the  data  file,  their  exponents  must  be  right-justified  to  span  the 
field  length  of  22. 


101 


NSWCTR  89-97 


POWER  EVALUATION 

Statistical  hypothesis  testing  is  a  decision  making  process  which  tells  the  experimenter 
whether  to  accept  or  reject  hypotheses  regarding  characteristics  of  the  sampling  population, 
i.e.,  the  population  from  which  a  sample  is  extracted.  These  characteristics  are  usually 
stated  in  terms  of  the  parameters  in  the  probability  distribution  which  governs  the  sampling 
population.  For  this  discussion,  0  will  be  used  to  designate  the  population  parameter  of 
interest.  The  hypothesis  under  test  is  referred  to  as  the  null  hypothesis  and  designated  Ho- 
The  hypothesis  complementary  fo  Ho  is  referred  to  as  the  aUemative  hypothesis  and  denoted 
by  H,.  Using  this  notation,  a  generic  test  of  hypothesis  can  be  stated  as  follows: 

H0:  Bed) 

H,:  0  e  to' . 

In  this  statement,  co  is  simply  a  subset  of  ft  ,  the  set  of  admissible  values  for  0  and 
co'  =  ft  -  to  .  For  example,  if  0  represents  the  population  variance  c2,  then  ft  is  the  set  of 
positive  real  numbers.  If  CD  is  taken  as  the  subset  of  reals  in  the  interval  (0,4),  then  co'  is 
the  set  of  reals  greater  than  or  equal  to  4. 

To  test  the  hypothesis  that  Ho  is  true,  a  random  sample  of  size  n  is  taken  from  the 
population  and  a  numerical  value,  referred  to  as  the  test  statistic,  T0  is  computed.  A  critical 
region  R  is  then  determined  such  that  the  decision  rule  rejects  Ho  (and  accepts  Ht)  if  T0  e 
R  and  accepts  Ho  otherwise.  Ideally,  T„  e  R  whenever  H,  is  true  and  T0<t  R  whenever  Ho 
is  true.  However,  this  kind  of  test  does  not  exist  unless  one  samples  the  entire  population. 
In  general,  our  decision  rule  is  subject  to  two  kinds  of  error: 

Type  I  error:  Reject  Hq  when  Ho  is  true 
Type  II  error:  Accept  Ho  when  Hj  is  true 

The  probabilities  of  committing  these  errors  are  referred  to  as  a  and  P,  respectively.  That 
is, 

a  =  Prob(committing  a  Type  I  error) 

=  Prob(rejecting  Ho  when  Hq  is  true) 

=  Prob(T0  e  R  when  0  e  CD  ) 

P  =  Prob(committing  a  Type  II  error) 

=  Prob(accepting  H0  when  H,  is  true) 

=  Prob(T0  e  R  when  0  €  cd'  ). 

The  first  of  these,  a ,  is  controlled  by  the  experimenter  through  the  size  of  the  critical  region 
R.  The  second  error,  P  ,  is  a  function  of  a ,  n,  and  a  specific  value  of  0.  Its  computation 
is  usually  involved  even  with  the  aid  of  high  speed  computers.  The  complement  of  P,  1  - 
P,  is  referred  to  as  the  power  of  the  test  and  is  simply  the  Prob(rejecting  H0  when  H,  is 
true).  Hence,  low  P  is  synonomous  with  high  power. 
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Oftentimes,  experimenters  make  strong  conclusions  from  test  results  only  upon  re¬ 
jecting  Ho.  In  this  situation,  a  specific  value  for  powei  is  of  little  interest.  However,  if 
strong  conclusions  are  also  to  be  made  upon  accepting  Ho,  then  the  power  of  the  test  is  of 
interest  and  one  must  have  a  means  to  evaluate  it.  In  fact,  for  planned  experiments,  the 
usual  criterion  for  selecting  sample  size  is  to  ensure  that  the  power  of  the  test  is  sufficiently 
high  for  a  given  value  or  values  of  0.  While  it  is  usually  not  practical  to  solve  for  n  for 
given  values  of  a,  (3,  and  0,  one  can  evaluate  power  for  a  range  of  values  of  n  and  select 
the  sample  size  on  the  basis  of  these  results.  References  1,  2,  5  and  7  provide  additional 
information  on  the  general  theory  and  application  of  hypothesis  testing  and  the  role  of 
power  in  sample  size  selection. 

To  aid  the  experimenter  in  selecting  the  sample  size  n  for  his  experiment,  power  curves 
have  been  prepared  by  various  authors  for  various  types  of  tests.  See,  for  example,  refer¬ 
ences  1,  3,  and  8.  Power  curves  show  power  (1  -  (3)  as  a  function  of  the  parameter  under 
test  for  specified  values  of  n  and  a.  (In  some  cases,  operating  characteristic  curves  have 
been  prepared  which  show  [3  instead  of  1  -  p .  These  curves  convey  the  same  information 
as  power  curves  and,  hence,  fall  into  the  category  of  power  curves.)  Curves  of  this  type  are 
helpful  to  the  experimenter,  but  they  have  some  serious  limitations.  First,  families  of 
curves  for  different  values  of  n  are  usually  placed  on  the  same  graph.  While  this  provides 
information  relating  test  sensitivity  to  sample  size,  the  many  curves  placed  on  a  single 
graph  makes  it  very  difficult  to  find  power  for  values  of  0  which  are  close  to  the  boundary 
of  co  and  co' .  (In  the  last  example,  this  would  refer  to  values  of  o2  close  to  4.)  Second, 
power  curves  are  available  for  only  the  more  frequently  used  values  of  a ,  usually  .05  and 
.01.  If  it  were  necessary  to  evaluate  power  for  values  of  a  other  than  these  common  values, 
one  would  have  to  resort  to  extensive  computation.  And  third,  power  curves  are  not 
available  from  a  single  source  which,  in  some  cases,  requires  the  experimenter  to  search 
the  literature  to  find  the  set  of  curves  appropriate  for  his  experiment. 

The  power  programs  in  STATLIB  have  been  prepared  with  the  intent  of  eliminating 
the  power  curve  problems  addressed  above.  Power  is  presented  in  numerical  tabular  form 
which  eliminates  any  problem  with  reading  values.  In  addition,  power  can  be  computed 
for  any  user  selected  values  of  0  <  a  <  1.  And  lastly,  STATLIB  provides  the  experimenter 
with  the  capability  to  compute  power  for  most  of  the  common  statistical  tests  and  for  some 
of  the  “not  so  common”  tests.  In  most  cases,  this  will  eliminate  the  need  to  search  the 
literature  for  the  appropriate  power  curve  in  sample  size  determination. 

The  power  programs  in  STATLIB  are  divided  into  two  sections,  one  for  continuous 
probability  distributions  and  one  for  discrete.  In  the  continuous  section,  the  critical  point 
(beginning  of  the  critical  region  in  the  tail  of  the  distribution)  for  the  test  is  not  printed. 
These  values  are  readily  available  as  percentiles  of  the  common  sampling  distributions,  i.e., 
the  normal.  Student’s  t,  chi-square,  and  the  F  distributions.  Furthermore,  each  page  of 
tabular  power  values  is  associated  with  many  values  of  n ,  each  of  which  has  a  different 
critical  point.  Hence,  the  user  is  referred  to  statistical  tables  (Reference  4,  for  example)  for 
the  critical  point  appropriate  for  his  test.  In  the  discrete  section,  the  critical  points  are 
printed  since  they  are  not  readily  available  in  statistical  tables.  These  critical  values  are  the 
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largest  (smallest)  integers  such  that  the  critical  regions  are  at  most  a-  The  problem  here  is 
that  the  test  statistic  T0  for  a  discrete  test  is  an  integer  and  hence,  the  critical  point  is  re¬ 
stricted  to  an  integer.  Therefore,  one  can  specify  a  =  .05  but  may  obtain  a  critical  region 
mu^h  smaller  than  .05.  To  provide  for  the  user  who  requires  a  discrete  critical  region  of 
size  a  exacdy,  the  discrete  power  programs  provide  a  *~ndomized  test  option  (Reference 
6).  In  addition  to  a  critical  point,  this  procedure  requires  the  computation  of  a  value  0  <  X 
<  1.  The  decision  rule  is  to  reject  H0  if  T0  is  greater  (less)  than  the  critical  point,  reject  H0 
with  probability  X  if  T0  is  equal  to  the  critical  point,  and  accept  Ho  otherwise.  If  this  option 
is  employed,  the  X  values  are  printed  as  well  as  the  critical  points,  and  the  power  is  com¬ 
puted  on  the  basic  of  a  critical  region  of  exact  size  a. 
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BIN1P0W 

PURPOSE 

Program  BIN  1POW  (Binomial  Power.  1  Population)  determines  the  critical  (rejection) 
region  and  evaluates  the  power  function  for  tests  of  hypotheses  regarding  the  proportion  of 
successes  p  in  a  binomial  population.  The  tests  are  based  on  a  random  sample  of  size  n 
from  the  population  assuming  that  p  remains  constant  from  trial  to  trial  and  that  the  trial 
outcomes  (success  or  failure)  are  mutually  independent.  The  three  relevant  hypotheses  and 
their  corresponding  test  statistics,  critical  regions,  and  power  functions  are  shown  below. 
In  each  case, 

a  =  size  of  the  critical  region 

=  Prob(committing  a  Type  I  error) 

and  the  test  statistic 

T0  =  X  -  number  of  successes  in  n  binomial  trials. 

The  probability  distribution  of  T0  depends  on  the  hypothesized  value  of  the  parameter  p  (p0 
or  p,  *  p0).  For  p  =  p„  this  distribution  will  be  denoted  by  g(t0  \  p ,)  and  has  form 

£('0!/?,)=  ,  ,0»  t0  =  0,\,2,...,n 

W 

The  critical  region  for  the  test  is  based  on  g(t0  |  p0)  and  the  power  function  is  expressed  in 
terms  ofg(r0|p,). 

********************* 

Hypothesis  ( 1 ) :  H0:  p  =  p  0 

H ,:p  *p„- 

Critical  Region:  Reject  H0  if  T0  <  c , 

Reject  H0  with  probability  y,  if  T0  =  cx 
Reject  H0  if  T0>  c2 

Reject  H0  with  probability  y2  if  T0  =  c2 
where  is  the  largest  integer  such  that 

ci-‘ 

Bx  =  I  g(t0 1  Pq)  <  a/2 

'0=0 

c2  is  the  smallest  integer  such  that 
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B2=  I  g(t0\p0)  <  a/2 

*Q~C2  +  ^ 

and 

ji  -  (a  /  2  -  B,)  /  g  (c,  |  p0)  (randomized  test) 
y2  =  (a  /  2  -B2)/  g  (c2 1  p0)  (randomized  test) 

Yi  =  Y2  =  0  (non-randomized  test) 

Power  Function:  ci  - 1 

Tl(pvn)=  I  g(f0|  Pi)  +  Yig(^i  I  Pi) 

i0  =  0 
n 

+  I  SOolPlJ  +  YrfC^IPl)  • 

'0  =  c2  +  1 

********************* 

Hypothesis  (2):  Ho:p<p0 

H^p  >Po- 

Critical  Region:  Reject  H0  if  T0  >  c 

Reject  H0  with  probability  y  if  T0  =  c 

where  c  is  the  smallest  integer  such  that 

n 

^1=  I  s('olA))  £  a 

<0 = °  *  1 
and 

y=(a-Ai)l g(c  |  p0)  (randomized test) 

=  0  (non-randomized  test) 


Power  Function: 


n 

n ipvn)=  I  g(t0\pl)  +  yg(c  |pj). 

I0  =  c  +  1 


Hypothesis  (3):  Ho.-  p  >  p0 

H,:p<p0- 


Critical  Region:  Reject  H0  if  T0  <  c 

Reject  H0  with  probability  y  if  T0  =  c 

where  c  is  the  smallest  integer  such  that 
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c-l 

A2=  X  u(t0\p0)  <  a 

<o  =  0 

and 

y  =  (a  -  A2)  /  g  (c  |  p0)  (randomized  test) 

=  0  (non-randomized  test) 


Power  Function: 


C-l 


n (pt,n)=  X  g(t0\px)+yxg(c  Ip,). 
<0  =  0 


FEATURES 

BIN  1POW  features  include  the  following: 

*  Computation  of  the  critical  value  c  and  randomized  constant  y  for  up  to  50  values 
of  the  sample  size  n  and  any  one  of  the  3  hypotheses. 

*  Computation  of  power  function  for  each  value  of  n  and  up  to  15  values  of  p  =  p\. 
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INPUT  GUIDE 

The  user-created  input  file  consists  of  two  record  types  as  described  below. 


Record 

Type 

Variable 

Descrimion 

Columns 

Format 

1 

NHYP 

=1,  hypothesis  (1) 

=2,  hypothesis  (2) 

=3,  hypothesis  (3) 

1-5 

15 

NTYPE 

=1,  randomized  test 
=2,  non-randomized  test 

6-10 

15 
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NSS  Number  of  N (I)  values  (n  values)  to  be  11-15  15 

read  in  on  record  type  2. 

(NSS  S  50) 

K  Number  of  px  values  requested  via  16-20  15 

px  =p0  ±  i*DELTA,  i=0 . K  for  NHYP  =1 

and  via 

pl=p0  +  i*DELTA,  i=0 . K  for  NHYP  = 

2,3. 


K  <  7  for  NHYP  =  1 

K  <  14  for  NHYP  =  2,3. 

DELTA 

Increment  for  generation  of  p  j  values. 

21-30 

F10.5 

PSUBO 

Hypothesized  value  of  p,i.e.,p0. 

31-40 

F10.5 

ALPHA 

Significance  level  a 

41-50 

F10.5 

Record  type  2  contains  the  NSS  N(I)  values  in  format  1018.  As  many  as  5  type  2  records 
may  be  required. 
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BIN2POW 


PURPOSE 


Program  BIN2POW  ('Binomial  Power.  2  Populations)  determines  the  critical  (rejec¬ 
tion)  region  and  evaluates  the  power  function  for  tests  of  hypotheses  regarding  the 
difference  in  the  proportions  of  success  in  two  binomial  populations  based  on  random 
samples  of  sizes  m  and  n  from  populations  1  and  2,  respectively.  For  population  i  (i  =  1,2), 
it  is  assumed  that  the  proportion  of  success  p  remains  constant  from  trial  to  trial  and  that 
the  trial  outcomes  (success  or  failure)  are  mutually  independent.  The  table  below  summa- 


rizes  the  notation  regarding  the  test  variables. 

Proportion  Number  of 

Number  of 

Sample 

Population  of  Success  Successes 

Failures 

.  Size. 

1  p,  X 

m-X 

m 

2  p2  Y 

n-Y 

n 

T=X  +  Y 

m+n-T 

m+n 

This  table  shows  that  X  and  Y  can  realize  integral  values  from  0  to  m  and  0  to  n,  respec¬ 
tively,  so  that  the  sample  space  can  be  designated  by  an  (m  +  1)  x  (n  +  1)  array.  The  method 
of  analysis  (Reference  4)  forms  the  critical  region  on  the  array  by  working  in  terms  of  the 
conditional  distribution  of  Y  given  T  for  T  =  0,1, 2,..., m+n.  The  power  of  the  test  is  then 
obtained  by  summing  probabilities  over  the  rejection  region  for  specific  values  of  the  p, 
under  the  alternative  hypothesis.  See  the  COMMENTS  section  for  an  example  which 
shows  the  rejection  region  in  the  two-way  sample  space.  The  three  relevant  hypotheses  and 
their  corresponding  test  statistics,  critical  regions,  and  power  functions  are  shown  below. 
In  each  case, 

a  =  size  of  the  critical  region 

=  Prob(committing  a  Type  I  error) 

and  the  test  statistic 


Y  =  number  of  successes  in  the  sample  of  size  n  from  Population  2. 
The  conditional  probability  distribution  of  Y  given  T  =  t  has  the  form 


P(Y  =  y \X  +  Y  =  t)  = 


'  m  ' 

(-I 

j-yJ 

kJ 

m  +  n 
t 


max(0 ,t-m)  <  y  <  min(r,n) 
t  =  0,1, 2,..., m+n 
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m  \  n 


;(i-pjr-xpZ(i-pjr 


when  the  null  hypothesis  Ho:  px  =  p2  is  true.  The  critical  region  for  the  test  is  based  on  this 
distribution  for  each  value  of  T.  The  power  of  the  test  is  not  dependent  on  T  and  is  obtained 
by  summing  probabilities  over  the  rejection  region.  These  unconditional  probabilities  are 
based  on  the  joint  probability  distribution  of  X  and  Y  which  has  the  form 

\x  J\j  j  y  ~  v,  i, 

where  the  subscript  1  on  P  indicates  specific  values  for  px  and  p2  under  the  alternative 
hypothesis.  The  power  functions  below  are  expressed  in  terms  of  Px{x,y). 


Hypothesis  (1):  Ho :  p,=p2 

Hi-'  Pi  ^Pz- 


Critical  Region 
for  T  =  t: 


Power  Function: 


Reject  H0  if  y  <yL 

Reject  H0  with  probability  yL  if  Y  =  yL 
Reject  H0  if  Y  >  yv 

Reject  H0  with  probability  yu  if  Y  =  yv 
where  yL  is  the  largest  integer  such  that 
>1-* 

Bi=  I  P0(YlO  *a/2 

y  -  miny 

yv  is  the  smallest  integer  such  that 

max? 

fi2=  I  P0(y  1 1)  <a/2 

>=>[/+i 


yL  =  (a/  2  -  B,)  /  P0(yL  j  t)  (randomized  test) 
yv  =  (a  /  2  -  B2)  /  P0(yu  I  0  (randomized  test) 

Yi  =  Yu  =  0  (non-randomized  test) 


n  (pl,p2,m,n)=  l[POWL(t)  +  POWv(t)] 
/  =  0 
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where 


*  *  *  *  * 
Hypothesis  (2): 

Critical  Region 
for  T  =  t: 


and 

Power  Function: 
where 

*  *  *  *  * 
Hypothesis  (3): 

Critical  Region 
for  T  =  t: 


POWL(t)  =  X  Px(t-y,y)  +  yLPl(t-yL,yL ) 

y  =  miny 
maxy 

POWu(t)=  X  Pl(t-y,y)  +  yuPl(t-yu,ylJ). 

y  =  +i 

**************** 


Ho: Pi 
Hi: Pi  >p2 . 


Reject  H0  if  Y  <  yL 

Reject  H0  with  probability  yL  if  Y  =  yL 
where  yL  is  the  largest  integer  such  that 

A,  =  X  p0(y  10  ^  a 

y  -  min  y 


yL  =  (a-A,) / P0(yL  1 t)  (randomized  test) 

yL=0  (non-randomized  test) 


m  +  n 

n (pvp2,m,n)=  X  POWL(t) 

/ =o 


POWL(t)=  \  Pl{t-y,y)+yLPl(t-yLiyL) 

y  =miny 

**************** 

Ho-'  Pi  Zp2 
H {.Pi  <p2- 

Reject  H0  if  Y  >  yv 

Reject  H0  with  probability  y^  if  Y  =  yv 
where  yv  is  the  smallest  integer  such  that 

maxy 

A2=  X  P0(y  1 t)  <  a 

y  =  yy  + 1 
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Yy  =  (a  -  A2)  /  P(iyv  1 1)  (randomized  test) 

Yf/  =  0  (non-randomized  test) 

ri(p„p2,m,rt)=  £  PCW^t) 

/  =0 

maxy 

POWv(t)=  1  Pft-y,y)+^Px{t~yu,yu). 
>=>!/+! 


FEATURES 

BIN2POW  features  include  the  following: 

*  Computation  of  the  critical  values  and  randomized  constant  for  sample  sizes  m  + 
n  <  850  and  any  one  of  the  3  hypotheses. 

*  Computation  of  the  power  function  for  all  pairs  of  px  and  p2  where  the  limitation 
on  the  number  of  values  of  each  p ,  is  25.  This  will  provide  as  many  as  625 
evaluations  of  power  in  a  single  run. 

*  A  plot  of  equal  power  contours  in  the  (pt-,  p2)  domain  for  as  many  as  10  values  of 
pov/e’- 

*  Computation  of  conditional  power  for  specified  realizations  t  of  T  and  as  many  as 
32  values  of  the  parameter  f>  =  p2Q\/Q2P\  (<7,  =  1  —  p,) .  See  the  COMMENTS 
section  for  a  discussion  of  this  feature. 

*  A  plot  of  the  conditional  power  as  a  function  of  p  for  each  value  of  T  = 

0,1  ,...,m+n. 

*  Capability  to  perform  multiple  cases  in  a  single  run. 

REFERENCES 

1.  Bennett,  B.  M.  and  Hsu,  P.  (1960),  “On  the  Power  Function  of  the  Exact  Test  for  the 
2X2  Contingency  Table”,  Biometrika,  Vol.  47. 

2.  Finney,  D.  J.  (1948),  “The  Fisher- Yates  Test  of  Significance  in  2  X  2  Contingency 
Tables”,  Biometrika,  Vol.  35. 

3.  Fisher,  R.  A.  (1967),  Statistical  Methods  for  Research  Workers,  Hafner  Publishing 
Co.,  Inc.,  pp.  85  -  99. 


Power  Function: 
where 
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4.  Lehmann,  E.  L.  (1959),  Testing  Statistical  Hypotheses ,  John  Wiley  and  Sons,  Inc., 
pp.  140  -  143. 

5.  Robertson,  W.  H.  (1966),  “Programming  Fisher’s  Exact  Method  of  Comparing  Two 
Percentages”,  Technometrics,  Vol.  2. 


INPUT  GUIDE 

The  user-created  input  file  consists  of  as  many  as  10  record  types.  The  first  three  are 
required  for  all  runs.  Record  types  4  and  5  are  optional  for  conditional  power  computation, 
6  through  9  are  optional  for  unconditional  power  computation,  and  10  is  optional  for  power 
contour  plotting. 


Record 


Type 

Variable 

DescriDtion 

Columns 

Format 

1 

NCASES 

Number  of  cases  in  this  job.  If  NCASES  > 

1 ,  records  2,  3  and  optional  records  must  be 
repeated  following  the  last  record  for  each 
case. 

1-2 

12 

2 

MSAMP 

Sample  size  from  population  1. 

1-5 

15 

NSAMP 

Sample  size  from  population  2. 

6-10 

15 

3 

NUMHYP 

=  1,  hypothesis  (1) 

=  2,  hypothesis  (2) 

=  3,  hypothesis  (3) 

1 

11 

ALPHA 

Significance  level  a. 

6-10 

F5.4 

IGAMPU 

=0,  randomized  test 

= 1 ,  non-randomized  test 

15 

11 

ICP 

=0,  conditional  power  not  requested. 

=1,  conditional  power  requested.  Re¬ 
cords  4  and  5  required. 

?0 

11 

IUCP 

=0,  unconditional  power  not  requested. 

=  1,  unconditional  power  requested.  Re¬ 
cords  6-9  required. 

25 

11 

NPLOT 

=0,  conditional  power  plots  not  requested. 

= 1 ,  conditional  power  plots  requested. 

(Be  careful.  A  “1”  will  generate  m+n 
plots.) 

30 

11 

Record  types  4  and  5  required  only  if  ICP  =  1,  i.e.,  conditional  power  requested. 
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4  NRHOS  Number  of  values  of  p.  1-2  12 

(NRHOS  <  32) 

5  VALRHO  The  values  of  p.  As  many  as  4  records  *  1-70  10F7.2 

(I)  may  be  required.  The  p’s  should  be  en¬ 
tered  in  ascending  order. 

Record  types  6-10  required  only  if  IUCP  =  1,  i.e.,  unconditional  power  requested. 


6  NP1S  Number  of  p,’s  for  unconditional  power.  1-2  12 

NP2S  Number  of  p2’s  for  unconditional  power.  6-7  12 

7  PI (I)  The  values  of  p As  many  as  2  records  1-80  16(2X, 

may  be  required  to  enter  the  NP1S  values.  F3.2) 

8  P2(I)  The  values  of  p2.  As  many  as  2  records  1-80  16(2X, 

may  be  required  to  enter  the  NP2S  values.  F3.2) 

9  NOPLOT  Number  of  equal  probability  power  con-  1-5  15 


tours  to  plot. 

>0,  record  type  10  required. 

=0,  contour  power  plots  not  requested. 

Record  type  10  must  be  excluded. 

(NOPLOT  <  10) 

10  BETA(I)  The  power  values  for  the  contour  plots.  1-50  10F5.4 

COMMENTS 

As  an  aid  to  understanding  the  testing  mechanism  employed  in  BIN2POW,  the  sample 
space  for  the  outcome  of  an  experiment  with  m  =  8  and  n  =  10  is  shown  in  Figure  1.  Also 
shown  are  the  values  of  X  +  Y  =  T  for  T  =  0,1,2,. ..,18  and  an  arbitrary  rejection  region  for 
a  non-randomized  test  of  hypothesis  (1).  The  power  of  this  test  would  be  obtained  by 
summing  the  joint  probabilities  in  the  rejection  region  for  specific  values  of  the  p,. 

The  concept  of  conditional  power  was  not  discussed  in  the  PURPOSE  section  and  will 
be  addressed  here.  Conditional  power  of  the  test  is  the  probability  of  rejecting  the  null 
hypothesis  given  knowledge  of  the  number  of  successes  in  the  combined  sample  from  both 
populations.  The  conditional  power  is  based  on  the  conditional  distribution  of  Y  given  X  + 
Y  -  t.  This  probability  distribution  is  as  follows: 
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m  \  n 


giCy  |  /)  = 


max> 

I 

I  =  finny 


*-y  Ay  F 

Y" 


,  miny  <  y  <  maxy 


P' 


m 

J 

where  min  v  =  max(0,t  -  wi)>  maxy  =  min(t,n),  and  p  =  QtP\-  Hence,  for  a  given  t 
and  p,  power  is  obtained  by  summing  probabilities  over  the  rejection  region  via  g,(y  1 1). 

Defining 

yL~l 2 3 


CPL(t, p)=  I  g,Cy  IO+YLgt(yJO 

y  =miny 


and 


maxy 


CPv(t,  p)=  I  g,(y  10+Yt/g.^t/U), 

>=>£/+I 


we  can  write  conditional  power  as  a  function  of  CPL(t,f> )  and  CPv(t,  p)  for  each  hypothesis 
as  shown  below: 

Hypothesis  Conditional  Power 


1  CPL(t)  +  CPu(t) 

2  CPL(t) 

3  CPv(t) 

BIN2POW  has  an  option  to  plot  the  conditional  power  as  a  function  of  p  (for  up  to  32  input 
values  of  p)  for  all  r  =  0,1  +  One  should  use  caution  in  exercising  this  option 

(NPLOT=l  on  Record  Type  3)  since  it  provides  a  single  plot  for  each  t  or  a  total  of  m  +  n 
plots.  If  m  or  n  is  large,  this  will  lead  to  the  generation  of  considerable  output. 
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SAMPLE  SPACE 
for 

HYPOTHESIS  1 
with 

m  =  8  and  n  =  10 


Arbitrary  rejection  region  for 
exemplification  only.  For  t  =  6, 
yL  =  1  andy  y  =  5.  For  r  =  8 ,y  L  = 
3  and  yv  =  6.  etc. 


FIGURE  1 
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P0I1P0W 


PURPOSE 

Program  POI1POW  (Poisson  Power.  \  Population)  determines  the  critical  (rejection) 
region  and  evaluates  the  power  function  for  tests  of  hypotheses  regarding  the  Poisson  pa¬ 
rameter  X .  The  tests  are  based  on  a  random  sample  of  size  n  from  a  Poisson  population 
with  probability  distribution  function 

f(x)  =  e~\xl  x\  ,  X>0\  x  =0,1,2,... 

where  a:  is  the  number  of  events  in  a  time  or  space  interval  of  length  t  (hereinafter  referred 
to  as  a  time  unit)  and  X  is  the  mean  rate  or  intensity,  i.e.,  the  expected  number  of  events  in 
a  time  unit  is  X.  The  three  relevant  hypotheses  and  their  corresponding  test  statistics, 
critical  regions,  and  power  functions  are  shown  below.  In  each  case, 

a  =  size  of  the  critical  region 

=  Prob(committing  a  Type  I  error) 

and  the  test  statistic 

T0  =  number  of  events  in  n  time  units  ( n  time  or  space  intervals  of 
length  t). 

The  probability  distribution  of  T0  depends  on  the  hypothesized  value  of  the  parameter  X  (Xq 
or  X,  *  Xq).  For  X  =  Xif  this  distribution  will  be  denoted  by  g(t0 1  Xt)  and  has  form 

g(t0\Xi)  =  e"'X‘(nXl),°/t0\  ,  r0  =  0,l,2,... 

The  critical  region  for  the  test  is  based  on  g  ( t0  \  Xq),  and  the  power  function  is  expressed  in 
terms  ofg(r0|X,). 

********************* 

Hypothesis  (1):  H0:  X  =  X0 

H, :  X  ^  Xq. 

Critical  Region:  Reject  H0  if  T0  <  ^ 

Reject  H0  with  probability  y,  if  T0  =  c, 

Reject  H0  if  T0  >  c2 

Reject  H0  with  probability  y2  if  T0  =  c2 

where  is  the  largest  integer  such  that 
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Power  Function: 


*  *  *  *  * 
Hypothesis  (2): 


Critical  Region: 


Power  Function: 


*  *  *  *  * 
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fl,=  Ig(f0i^o)  ^  a/2 


c2  is  the  smallest  integer  such  that 

B2=  £  g(t0 1  Xq)  <  a/2 
'0-c2+  ^ 


7i  =  (a/2-S,)/g(c1 1  Xq) 
Y2 =  (a/ 2  —  B2)  /  g  (c2 1  Xq) 
Y,=y2  =  0 


(randomized  test) 
(randomized  test) 
(non-randomized  test) 


‘i-1 

n(A.pn)=  I  g(t0 1 2t,)  -t-  Yi.gCc,  1  Xj) 

'0=o 

+  £  g(t0Ul)  +  Y2g(C2l^l) 

l0  =  C2*\ 


Hq:  X  <Xq 
H,:  X>XQ. 


*  * 


Reject H0 if  T0>  c 

Reject  H0  with  probability  y  if  T0  =  c 
where  c  is  the  smallest  integer  such  that 

A,=  £  g(t0 1 X0)  <  a 

»0  =  c  +  l 


and 

Y  =  (a- A,)/g(c  |  Xq)  (randomized  test) 

=  0  (non-randomized  test) 


n(X„n)=  £  g(t0 1  Xi)+Yg(c  i  Xi) . 

I0  =  c  +  1 
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Hypothesis  (3):  H0:  X  >  X*> 

H,:  \<  Xo. 

Critical  Region:  Reject  H0  if  T0  <  c 

Reject  H0  with  probability  y  if  T0  =  c 

where  c  is  the  largest  integer  such  that 

^2=  £  jK'ol^o)  ^  « 

/0  =  0 


and 

y  =  (a  -A2)l  g(c  |  Xfl)  (randomized  test) 

=  0  (non-randomized  test) 


Power  Function: 


c-  1 


n(^,n)=  IgCrol^+Yg^lX,). 

»0  =  ° 


FEATURES 

POI1POW  features  include  the  following: 

*  Computation  of  the  critical  value  c  and  randomized  constant  y  for  up  to  50  values 
of  the  sample  size  n  and  any  one  of  the  3  hypotheses. 

*  Computation  of  power  function  for  each  value  of  n  and  up  to  15  values  of  X  =  . 


REFERENCES 

1.  Feller,  William  (1960),  An  Introduction  to  Probability  Theory  and  Its  Applications, 
John  Wiley  and  Sons,  Inc.,  pp.  146  -  154. 

2.  Krutchkoff,  Richard  G.  (1970),  Probability  and  Statistical  Inference,  Gordon  and 
Breach  Science  Publishers,  pp.  224  -  229. 
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INPUT  GUIDE 

The  user-created  input  file  consists  of  two  record  types  as  described  below. 

Record 

Type  Variable  _ De&EDBliim _  Columns  Format 


1 

NHYP 

=  1,  hypothesis  (1) 

=  2,  hypothesis  (2) 

=  3,  hypothesis  (3) 

1-5 

15 

NTYPE 

=  1,  randomized  test 
=  2,  non-randomized  test 

6-10 

T5 

NSS 

Number  of  N(I)  values  (n  values)  to  be 

11-15 

15 

read  in  on  record  type  2. 

(NSS  <  50) 

K  Number  of  Xj  values  requested  via  16-20  15 

X,  =  A0±i*DELTA,  i=0,...,K 
for  NHYP  =  1  and  via 
X,  =  X<,  +i*DELTA,  i=0,...,K 

for  NHYP  =  2,3. 

K  $7  for  NHYP  =  1. 

K  £  14  for  NHYP  =  2,3’. 


DELTA 

Increment  for  generation  of  Xj  values. 

21-30 

F10.5 

LAMDA 

Hypothesized  value  of  X,  i.e.,  Xo 

31-40 

F10.5 

ALPHA 

Significance  level  a. 

41-50 

FI  0.5 

LIMIT 

N(I)*LAMDA  >  LIMIT  activates  the  nor¬ 
mal  approximation  to  the  Poisson. 
(DEFAULT  =  1000.) 

51-60 

110 

Record  type  2  contains  the  NSS  N(I)  values  in  format  1018.  As  many  as  5  type  2  records 
may  be  required. 
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CONTINUOUS 

POWER 

EVALUATIONS 
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N0R1PW 


PURPOSE 


Program  NOR1PW  (Normal  Power,  1  Population)  evaluates  the  power  function  for 
tests  of  hypotheses  regarding  the  mean  of  a  normal  population.  The  tests  are  based  on  a 
random  sample  of  size  n  from  a  normal  population  having  known  variance  a2.  Let  X  be  a 
normal  random  variable  with  unknown  mean  p  and  known  variance  o2.  Then  the  sample 
mean  X  is  a  normal  random  variable  with  unknown  mean  p  and  known  variance  d*/n, 

where  X  =  X"=1  Xjn .  These  tests  are  commonly  referred  to  as  one-sample  Z  tests  (see 

References  1  and  2).  The  three  relevant  hypotheses  and  their  corresponding  test  statistics, 
critical  regions,  and  power  functions  are  given  below.  In  each  case, 

a  =  size  of  the  critical  region 

=  Prob(committing  a  Type  I  error) 
and  the  test  statistic 

T0  =  Z0  =  (X-p0)/(o/^) 


where  Po  is  the  hypothesized  value  of  the  true  mean  p.  The  probability  density  function  of 
To  is  normal  with  a  variance  of  1 .  Its  mean  depends  on  the  true  value  of  the  parameter  p  ((4 
or  Pi  *  Po).  For  p  =  p„  this  density  will  be  denoted  by  g(t0 1  p,).  Define  8  =  |  p,  -  Pol  /  a. 
Then  g  (t0  |  p,)  has  the  form 


olH;)  = 


1 


— OO  <  t0  <  oo 


The  critical  region  for  the  test  is  based  on  g  ( t0 1  p^)  and  the  power  function  is  expressed  in 
terms  of  g(r0l  Pi)- 


*******************  ** 

Hypothesis  (1):  Hq.  p  =  Po 
H,:  P*Po- 

Critical  Region:  Reject  H0  if  IT01  >  2\-aJ2 

where  z,  _aJ2  is  the  100(1  -  a/2)th  percentage  point  of  the  standard  nor¬ 
mal  distribution  (i.e.,  mean  0  and  variance  1).  Hence, 


127 


NSWC  TR  89-97 


Power  Function: 


0/2  =^i  .a/2S(t0\\io)dti>. 
r  z  r°° 

n(lii,n)=}_^g(t0fix1)dt0+lZi_a/2g(t0iii1)dt0 . 


********************* 


Hypothesis  (2):  Hq.-  p  <  p<, 

H,:  H>  Mo- 


Critical  region:  Reject  Ho  if  T0  >  zx  _a 

Hence, 


a  =  f,  _8(to\»*)dto- 


1  -a 


Power  Function:  n (p„/t)=L  „g(t0ini)^0- 


1  -a 

*************  ******** 

Hypothesis  (3):  H^  p  >  po 

Hi:  p<p o. 

Critical  Region:  Reject  H«  if  T0  <  za 

Hence, 


a=r“g(folH<M) 


Power  Function:  ri(Pi,n)  =\Za  (  Pi)rft0  . 


FEATURES 

NOR1PW  allows  the  user  to  process  up  to  25  cases  in  a  single  computer  run.  Each 
case  corresponds  to  a  single  range  of  the  user-specified  Type  I  errors.  In  NOR1PW  the 
user  specifies  the  type  of  test  (i.e.,  one-  or  two-sided),  and  a  range  of  values  for  sample  size, 
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Type  1  error,  and  delta,  where  delta  is  the  number  of  standard  deviations  that  the  true  mean 
is  from  the  hypothesized  mean  in  absolute  value.  Hypotheses  (2)  and  (3)  are  one-sided 
tests  while  hypothesis  (I)  is  a  two-sided  test.  Each  page  of  NOR1PW  output  consists  of 
the  following: 

*  A  power  table  for  the  specified  test  type  and  Type  I  error  for  up  to  40  values  of  n 
and  up  to  9  values  of  delta. 


REFERENCES 

1.  Bowker,  A.  H.  and  Lieberman,  G.  J.  (1972),  Engineering  Statistics,  Second  Edition, 
Prentice-Hall,  Inc.,  pp.  183  -  198. 

2.  Brownlee,  K.  A.  (1965),  Statistical  Theory  and  Methodology  in  Science  and  Engi¬ 
neering,  Second  Edition,  John  Wiley  and  Sons,  Inc.,  pp.  105  -  109  and  pp.  113  -  118. 

3.  Freund,  John  E.  (1962),  Mathematical  Statistics,  Prentice-Hall,  Inc.,  pp.  262  -  263. 


INPUT  GUIDE 

The  user  must  create  a  data  file  containing  the  following  record  types.  The  first  record 
type  specifies  the  number  of  cases  (NCASES)  to  be  run.  The  second  record  type  specifies 
the  limits  for  delta,  sample  size,  and  Type  I  error  together  with  the  test  type.  If  NCASES 
is  greater  than  one,  record  type  2  must  be  repeated  for  each  case. 

More  specifically  the  required  data  file  is  constructed  as  follows: 


Record 

Tvpe 

Variable 

Description 

Columns 

Format 

1 

NCASES 

Number  of  cases 

1-5 

15 

2 

DELTALL 

Lower  limit  of  delta  values 

(DELTALL  >  0.0) 

1  -  8 

F8.0 

DINCR 

Delta  value  increment 

(DINCR  >0.0) 

9-16 

F8.0 

DELTAUL 

Upper  limit  of  delta  values 

(DELTAUL  >  0.0) 

17-24 

F8.0 

FNLL 

Lower  limit  of  sample  size  values 

(FNLL  >  0) 

25-32 

F8.0 
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FNINCR 

Sample  size  value  increment 
(FINCR  >  0) 

33-40 

F8.0 

FNUL 

Upper  limit  of  sample  size  values 
(FNUL  >  0) 

41-48 

F8.0 

ALPHALL 

Lower  limit  of  Type  I  error  values 
(0.0  <  ALPHALL  <  1.0) 

49-56 

F8.0 

AINCR 

Type  I  error  value  increment 
(AINCR  >  0.0) 

57-64 

F8.0 

ALPHAUL 

Upper  limit  of  Type  I  error  values 
(0.0  <  ALPHAUL  <  1.0) 

65-72 

F8.0 

TYPE 

Test  type,  i.e., 

=1,  for  one-sided  test 

(hypothesis  (2)  or  (3)) 

=2,  for  two-sided  test 
(hypothesis  (1)) 

73-80 

F8.0 

One  record  type  2  is  required  for  each  case  to  be  processed. 
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NOR2PWE 


PURPOSE 


Program  NOR2PWE  (Normal  Power.  2  Populations,  Equal  Sample  Sizes)  evaluates 
the  power  function  for  tests  of  hypotheses  regarding  the  equality  of  the  means  of  two 
normal  populations.  The  tests  are  based  on  two  independent  random  samples  each  of  size 

n  from  two  normal  populations  having  known  variances  <jf  and  c^.  Let  X,  and  X2  be 
normal  random  variables  with  unknown  means  p,  and  and  known  variances  and  o^, 
respectively.  Then  the  sample  means  Xx  and  X2  are  normal  random  variables  with  un¬ 
known  means  p,  and  P2  and  known  variances  d\ln  and  cfyn,  respectively,  where 
X,  =  Z"=  ,Xy  /«,.  These  tests  are  commonly  referred  to  as  two-sample  Z  tests  (see  Reference 

1).  The  three  relevant  hypotheses  and  their  corresponding  test  statistics,  critical  regions, 
and  power  functions  are  given  below.  In  each  case, 

a  =  size  of  the  critical  region 

=  Prob(committing  a  Type  I  error) 
and  the  test  statistic 

T0-Z0  =  (X, -X2) / V«*i  +  <*2)1  n  . 


The  probability  density  function  of  T0  is  normal  with  a  variance  of  1.  Its  mean  depends  on 
the  true  value  of  the  difference  d  =  p,  -  Ps  {d  =  d0  or  d  =  d,  *  d0).  d0  =  0  is  the  hypothesized 
value  of  d.  For  d  =  d„  this  density  will  be  denoted  by  g (/0 1  d,).  Define  5  =  1 41  /V°i  +  °2- 
Then  g  ( t0 1 4)  has  the  form 


g('oK)  = 


1 


—00  <  r0  <  00 


The  critical  region  for  the  test  is  based  on  g  (r0 1  d0 )  and  the  power  function  is  expressed  in 
terms  of  g{t0\d]). 


******  *************** 


Hypothesis  (1):  Hq:  p,  =  pj 
Hf.  Pi  *  p2* 
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Critical  Region:  Reject  Ho  if  I70l  >  zl_a/2 

where  z,.^  is  the  100(1  -  ot/2)th  percentage  point  of  the  standard  nor¬ 
mal  distribution  (i.e.,  mean  0  and  variance  1).  Hence, 

aJ2=!Zi_o u2g(t0\d0)dt0. 

C  Z  f°° 

Power  Function:  IlWi, «)  1  ^i)^0+JZl  ci/2gOo  I  dx)dt0 . 

*************  ******** 

Hypothesis  (2):  Ho:  Mi  <  M2 

H,:  Mi  >  Mi- 

Critical  region:  Reject  H0  if  T0  >  z,_a 

Hence, 

a.=jzi_j(to\d0)dtQ. 

Power  Function:  11(^1, «)  =  jZ(  q g  (r0 1  dx)dt0 . 

*************  ******** 

Hypothesis  (3):  Hq:  Mi  ^  Mi 

H,:  Mi  <  M2  • 

Critical  Region:  Reject  Ho  if  Ta  <  za 

Hence, 

a=iZ*g(t0\d0)dt0. 

Power  Function:  W{dx,n)=\  “g(r0 1  dx)dt0  . 


132 


NSWC  TR  89-97 


FEATURES 

N0R2PWE  allows  the  user  to  process  up  to  25  cases  in  a  single  computer  run.  Each 
case  corresponds  to  a  single  range  of  the  user-specified  Type  I  errors.  In  NOR2PWE  the 
user  specifies  the  type  of  test  (i.e.,  one-  or  two-sided),  and  a  range  of  values  for  sample  size, 
Type  I  error,  and  delta,  where  delta  is  the  number  of  standard  deviations  that  the  true  mean 
is  from  the  hypothesized  mean  in  absolute  value.  Hypotheses  (2)  and  (3)  are  one-sided 
tests  while  hypothesis  (1)  is  a  two-sided  test.  Each  page  of  NOR2PWE  output  consists  of 
the  following: 

*  A  power  table  for  the  specified  test  type  and  Type  I  error  for  up  to  40  values  of  n 
and  up  to  9  values  of  delta. 


REFERENCES 

1.  Bowker,  A.  H.  and  Lieberman,  G.  J.  (1972),  Engineering  Statistics,  Second  Edition, 
Prentice-Hall,  Inc.,  pp.  225  -  235. 

2.  Freund,  John  E.  (1962),  Mathematical  Statistics,  Prentice-Hall,  Inc.,  pp.  266  -  267. 

3.  Walpole,  R.  E.  and  Myers,  R.  H.  (1985),  Probability  and  Statistics  for  Engineers  and 
Scientists,  Third  Edition,  Macmillan,  pp.  283  -  284. 


INPUT  GUIDE 

The  user  must  create  a  data  file  containing  the  following  record  types.  The  first  record 
type  specifies  the  number  of  cases  (NCASES)  to  be  run.  The  second  record  type  specifies 
the  limits  for  delta,  sample  size,  and  Type  I  error  together  with  the  test  type.  If  NCASES 
is  greater  than  one,  record  type  2  must  be  repeated  for  each  case. 

More  specifically  the  required  data  file  is  constructed  as  follows: 


Record 

Type 

Variable 

Description 

Columns 

Format 

1 

NCASES 

Number  of  cases 

1  -  5 

15 

2 

DELTALL 

Lower  limit  of  delta  values 

(DELTALL  >  0.0) 

1  -  8 

F8.0 

DINCR 

Delta  value  increment 

(DINCR  >  0.0) 

9-  16 

F8.0 

DELTAUL 

Upper  limit  of  delta  values 

(DELTAUL  >  0.0) 

17-24 

F8.0 
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FNLL 

Lower  limit  of  sample  size  values 

(FNLL  >  0) 

25  -  32 

F8.0 

FNINCR 

Sample  size  value  increment 
(FINCR  >  0) 

33-40 

F8.0 

FNUL 

Upper  limit  of  sample  size  values 

(FNUL  >  0) 

41  -  48 

F8.0 

ALPHALL 

Lower  limit  of  Type  I  error  values 

(0.0  <  ALPHALL  <  1.0) 

49-56 

F8.0 

AINCR 

Type  I  error  value  increment 

(AINCR  >  0.0) 

57-64 

F8.0 

ALPHAUL 

Upper  limit  of  Type  I  error  values 

(0.0  <  ALPHAUL  <  1.0) 

65-72 

F8.0 

TYPE 

Test  type,  i.e., 

=1,  for  one-sided  test 

(hypothesis  (2)  or  (3)) 

=2,  for  two-sided  test 

73-80 

F8.0 

(hypothesis  (1)) 

One  record  type  2  is  required  for  each  case  to  be  processed. 
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N0R2PWU 


PURPOSE 


Program  NOR2PWU  (Normal  Power.  2  Populations,  Unequal  Sample  Sizes)  evaluates 
the  power  function  for  tests  of  hypotheses  regarding  the  equality  of  the  means  of  two 
normal  populations.  The  tests  are  based  on  two  independent  random  samples  each  of  sizes 
n{  and  n2,  respectively,  from  two  normal  populations  having  known  variances  a?  and  C^.  If 
the  sample  sizes  are  equal,  program  NOR2PWE  should  be  used.  LetX,  and  X2  be  normal 
random  variables  with  unknown  means  p.j  and  p^  and  known  variances  o]  and  o§,  respec¬ 
tively.  Then  the  sample  means  Xx  and  X2  are  normal  random  variables  with  unknown 
means  p!  and  p^  and  known  variances  <5\lnx  and  c^//i2,  respectively,  where X,  =  'Ln]  =  \Xijl nr 

These  tests  are  commonly  referred  to  as  two-sample  Z  tests  (see  Reference  1).  The  three 
relevant  hypotheses  and  their  corresponding  test  statistics,  critical  regions,  and  power 
functions  are  given  below.  In  each  case, 

a  =  size  of  the  critical  region 

=  Prob(committing  a  Type  I  error) 
and  the  test  statistic 

T0  =  Z0  =  (X,  -X2) /  V(o?/fi,  +  ^/n2) . 


The  probability  density  function  of  T0  is  normal  with  a  variance  of  1.  Its  mean  depends  on 
the  true  value  of  the  difference  d  =  Pi  -  ^  (d  =  d0  or  d  =  d,  *  d0).  d0  =  0  is  the  hypothesized 
value  of  d.  For  d  =dh  this  density  will  be  denoted  by  g(t0 1  dt).  Define  8  =  |  d,\  .  Then 
g(t0 1  di)  has  the  form 


g(fo  K)  = 


-oo  <tQ<oo 


The  critical  region  for  the  test  is  based  on  g(t0  \  d0)  and  the  power  function  is  expressed  in 
terms  of  g(t0  \  dx). 


Hypothesis  (1):  H0:  P;  =  p^ 

Hp  Pi  ^  P2- 
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Critical  Region:  Reject  H0  if  IT0I  >  zl_al2 

where  z,.^  is  the  100(1  -  a/2)th  percentage  point  of  the  standard  nor¬ 
mal  distribution  (i.e.,  mean  0  and  variance  1).  Hence, 

a/2=^zi_cv2  S^o\d0)dt0. 

f  zaJ2  f  °° 

Power  Function:  n(di,«„n2)=J  g(t0\  d1)dt0+)2  ng(t0\dl)dt0 . 

~o®  1  -a/2 

********************* 

Hypothesis  (2):  H*,: 

H,:  li,  >  |i* 

Critical  region:  Reject  Hq  if  T0  >  z,  _a 

Hence, 

a=jhj(tQ\d0)dt0. 

Power  Function:  I\(d1,ni,n2)  =  J,  g (t0 1  dx)dt0 . 

I -a 

********************* 

Hypothesis  (3):  Hq:  ja,  >  (Xz 

H,.'  M-i  <  M-2- 

Critical  Region:  Reject  H0  if  T0  <  za 

Hence, 

a  =!\('o  I  d0)dt0 . 

Power  Function:  IKd,, n„  n2)  =  J  a  g  (t0 1  dx)dt0 . 
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FEATURES 

NOR2PWU  allows  the  user  to  process  up  to  25  cases  in  a  single  computer  run.  Each 
case  corresponds  to  a  single  range  of  the  user-specified  Type  I  errors.  NOR2PWU  provides 
the  user  with  the  option  of  either  computing  a  single  power  for  a  specified  sample  size  pair 
or  printing  a  table  of  power  values.  The  known  values  of  the  population  standard  deviations 
are  always  required.  If  the  single  power  option  is  chosen,  the  user  must  additionally  specify 
the  type  of  test  (i.e.,  one-  or  two-sided),  a  sample  size  pair,  and  a  value  of  delta,  where  delta 
is  the  absolute  difference  in  the  true  means.  Hypotheses  (2)  and  (3)  are  one-sided  tests 
while  hypothesis  (1)  is  a  two-sided  test.  If  the  power  table  option  is  chosen,  the  user  instead 
specifies  the  test  type  and  a  range  of  values  for  the  first  population  sample  size,  Type  I 
error,  and  delta.  In  addition,  a  multiplicative  factor  must  be  provided  which  relates  the 
second  population  sample  size  to  the  first.  Under  the  power  table  option  each  page  of 
NOR2PWU  output  consists  of  the  following: 

*  A  power  table  for  the  specified  test  type  and  Type  I  error  for  up  to  40  values  of 
the  pair  (nun2)  and  up  to  9  values  of  delta. 


REFERENCES 

1.  Bowker,  A.  H.  and  Lieberman,  G.  J.  (1972),  Engineering  Statistics,  Second  Edition, 
Prentice-Hall,  Inc.,  pp.  225  -  235. 

2.  Freund,  John  E.  (1962),  Mathematical  Statistics,  Prentice-Hall,  Inc.,  pp.  266  -  267. 


INPUT  GUIDE 


The  user  must  create  a  data  file  containing  the  following  record  types.  The  first  record 
type  specifies  the  number  of  cases  (NCASES)  to  be  run.  The  second  record  type  specifies 
the  known  population  standard  deviations.  The  third  record  type  specifies  the  power  table 
computation  option.  If  the  single  power  computation  option  is  chosen,  the  fourth  record 
type  is  different  and  a  fifth  record  type  is  also  required.  In  this  case,  the  fourth  record  type 
specifies  the  limits  for  delta,  the  first  population  sample  size,  and  the  Type  I  error  together 
with  the  test  type.  The  fifth  record  type  specifies  the  value  of  the  multiplicative  factor 
relating  the  second  population  sample  size  to  the  first.  If  NCASES  is  greater  than  one, 
record  types  2,  3,  4  (4A  or  4B)  and,  if  necessary,  5  must  be  repeated  for  each  case. 


More  specifically  the  required  data  file  is  constructed  as  follows: 

Record 

lyg£  Variable  _ Description _  Columns 

1  NCASES  Number  of  cases  1  -  5 


Format 

15 
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SIGMA  1 

1st  population  standard  deviation 
(SIGMA  1  >0.0) 

1-20 

F20.0 

SIGMA2 

2nd  population  standard  deviation 
(SIGMA2  >  0.0) 

21-40 

F20.0 

IOPTION 

Power  computation  option 
=  0,  for  power  table  output 

=  1,  for  single  power  value 

1-5 

15 

The  following  record  type  is  included  only  if  IOPTION 

=  1. 

D(l) 

Delta  value 

(D(l)  >  0.0) 

1-8 

F8.0 

FN(1) 

1st  population  sample  size 

(FN(1)  >  0) 

9-16 

F8.0 

FN2 

2nd  population  sample  size 

(FN2  >  0) 

17-24 

F8.0 

ALPH(l) 

Type  I  error 

(0.0  <ALPH(1)<  1.00) 

25-32 

F8.0 

TYPE 

Test  type,  i.e., 

=1,  for  one-sided  test 

(hypothesis  (2)  or  (3)) 

=2,  for  two-sided  test 

33-40 

F8.0 

(hypothesis  (1)) 


The  following  record  type  is  included  only  if  IOPTTON  =  0. 


4B  DELTALL 

Lower  limit  of  delta  values 

(DELTALL  >  0.0) 

1  -  8 

F8.0 

DINCR 

Delta  value  increment 

(DINCR  >  0.0) 

9-16 

F8.0 

DELTAUL 

Upper  limit  of  delta  values 

(DELTAUL  >  0.0) 

17-24 

F8.0 

FNLL 

Lower  limit  of  first  population  sample  size 
values 

(FNLL  >  0) 

25-32 

F8.0 
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FNINCR 

First  population  sample  size  value  increment 

(FNINCR  >  0) 

33-40 

F8.0 

FNUL 

Upper  limit  of  first  population  sample  size 
values 

(FNUL  >  0) 

41  -48 

F8.0 

ALPHALL 

Lower  limit  of  Type  I  error  values 

(0.0  <  ALPHALL  <  1.0) 

49-56 

F8.0 

AINCR 

Type  I  error  value  increment 

(AINCR  >  0.0) 

57-64 

F8.0 

ALPHAUL 

Upper  limit  of  Type  I  error  values 
(0.0  <  ALPHAUL  <  1.0) 

65-72 

F8.0 

TYPE 

Test  type,  i.e., 

=1,  for  one-sided  test 

(hypothesis  (2)  or  (3)) 

=2,  for  two-sided  test 
(hypothesis  (1)) 

73-80 

F8.0 

Record  type  5  is  included  only  if  IOPTION  =  0. 

5  FACTOR  Sample  size  multiplicative  factor.  Satisfies  1-20  F20.0 

the  relationship  n2  =  FACTOR*n,. 

(FACTOR  >  0.0) 

One  record  type  2,  3,  and  4  (4A  or  4B)  is  required  for  each  case  to  be  processed.  One 
record  type  5  is  required  for  each  cased  in  which  IOPTION  =  0. 
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T1P0W 

PURPOSE 

Program  T1POW  (T  Power.  I  Population)  evaluates  the  power  function  for  tests  of 
hypotheses  regarding  the  mean  of  a  normal  population.  The  tests  are  based  on  a  random 
sample  of  size  n  from  a  normal  population  having  unknown  variance  a2.  Let  X  be  a  normal 
random  variable  with  unknown  mean  |i  and  unknown  variance  a2//!.  Then  the  sample 
mean  X  is  a  normal  random  variable  with  unknown  mean  p  and  unknown  variance  cfln, 

where  X=X"=,  X,/«.  These  tests  are  commonly  referred  to  as  one-sample  t  tests  or 

small-sample  t  tests  (see  Reference  1).  The  three  relevant  hypotheses  and  their  corre¬ 
sponding  test  statistics,  critical  regions,  and  power  functions  are  given  below.  In  each  case, 

a  =  size  of  the  critical  region 
=  Prob(committing  a  Type  I  error) 
and  the  test  statistic 

T0  =  r0  =  (X-p0)/(5/V^) 

where  Po  is  the  hypothesized  value  of  the  true  mean  p.  S  is  the  positive  square  root  of  S2, 
the  unbiased  estimator  of  a2,  where  S2  =  2£=1(X,  -Xf/(n  - 1).  The  probability  density 
function  of  T0  depends  on  the  true  value  of  the  parameter  p  (Po  or  p,  *  Po).  For  p  =  p,, 
this  density  will  be  denoted  by  g(t0  |  p,).  Define  8  =  |  p,  -  Pol  / o.  Then  g(t0 1  p,)  is  a  non¬ 
central  t  distribution  with  parameters  v  (degrees  of  freedom)  and  8'  =  S/^fn  (non-centrality 
parameter)  and  is  given  by 


Here  v  =  n  -  1.  When  p,  =  Po,  8'  =  0  and  g (t0 1  p,)  reduces  to  the  central  t  distribution  with 
n  -  1  degrees  of  freedom.  Otherwise,  8'  is  non-zero.  The  critical  region  for  the  test  is  based 
on  g  (r0 1  Po)  and  the  power  function  is  expressed  in  terms  of  g  (f0 1  pj). 

Hypothesis  (1):  H0:  p  =  Po 

H,:  P*P«. 
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Critical  Region: 

Power  Function: 

*  *  *  * 

Hypothesis  (2): 

Critical  region: 

Power  Function: 

*  *  *  *  * 

Hypothesis  (3): 

Critical  Region: 

Power  Function: 


Reject  Ho  if  IT0!  >  tv<x_oJ2 

where  tv,i-o/2  is  the  100(1  -  ot/2  )th  percentage  point  of  the  central  t 
distribution  with  v  degrees  of  freedom.  Hence, 

a/2=l/w>,_  a^(*°IPo)<V 

noii.11)  1  \Hi)dt0+\K  l  a/2g(r0 1  Hi)<fco  • 

**************** 

Ho:  H<Po 

H,: 

Reject  Hq  if  T0  >  rVil_a 
Hence, 

a=-ftv>,_a^(/olFo)^o. 

n(^,«)4vl_a*('oiM<v 


**************** 

Ho:  ^t>Po 
H,:  p<Po- 

Reject  Ho  if  r0  <  Va 
Hence, 

ri(M.i,n)=l^'ag(foi  it,)^0  • 
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FEATURES 

T1POW  allows  the  user  to  process  up  to  25  cases  in  a  single  computer  run.  Each  case 
corresponds  to  a  single  range  of  the  user-specified  Type  I  errors.  In  T1POW  the  user 
specifies  the  type  of  test  (i.e.,  one-  or  two-sided),  and  a  range  of  values  for  sample  size, 
Type  I  error,  and  delta,  where  delta  is  the  number  of  standard  deviations  that  the  true  mean 
is  from  the  hypothesized  mean  in  absolute  value.  Hypotheses  (2)  and  (3)  are  one-sided  tests 
while  hypothesis  (1)  is  a  two-sided  test.  Each  page  of  T1POW  output  consists  of  the  fol¬ 
lowing: 

*  A  power  table  for  the  specified  test  type  and  Type  I  error  for  up  to 

40  values  of  n  and  up  to  9  values  of  delta. 


REFERENCES 

1.  Bowker,  A.  H.  and  Lieberman,  G.  J.  (1972),  Engineering  Statistics,  Second  Edition, 
Prentice-Hall,  Inc.,  pp.  198  -  207. 

2.  Brownlee,  K.  A.  (1965),  Statistical  Theory  and  Methodology  in  Science  and  Engi¬ 
neering,  Second  Edition,  John  Wiley  and  Sons,  Inc.,  pp.  295  -  296. 

3.  DiDonato,  A.  R.  (1988),  unpublished  notes  and  computer  code  for  numerically 
evaluating  a  specialized  infinite  sum  occurring  in  the  expression  for  the  power  of  the 
t  test. 

4.  Freund,  John  E.  (1962),  Mathematical  Statistics,  Prentice-Hall,  Inc.,  pp.  263  -  264. 

5.  Thomas,  M.  A.  (1988),  unpublished  notes  deriving  an  expression  for  the  power  of  the 
t  test  in  terms  of  the  incomplete  beta  function. 

6.  Wine,  R.  L.  (1964),  Statistics  for  Scientists  and  Engineers,  Prentice-Hall,  Inc.,  pp. 
254  -  260. 


INPUT  GUIDE 

The  user  must  create  a  data  file  containing  the  following  record  types.  The  first  record 
type  specifies  the  number  of  cases  (NCASES)  to  be  run.  The  second  record  type  specifies 
the  limits  for  delta,  the  sample  size,  and  Type  I  error  together  with  the  test  type.  If 
NCASES  is  greater  than  one,  record  type  2  must  be  repeated  for  each  case. 

More  specifically  the  required  data  file  is  constructed  as  follows: 
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Record 


Variable 

Description 

Columns 

Format 

NCASES 

Number  of  cases 

1  -  5 

15 

DELTALL 

Lower  limit  of  delta  values 

(DELTALL  >  0.0) 

1-8 

F8.0 

DINCR 

Delta  value  increment 

(DINCR  >  0.0) 

9-16 

F8.0 

DELTAUL 

Upper  limit  of  delta  values 

(DELTAUL  >0.0) 

17-24 

F8.0 

FNLL 

Lower  limit  of  sample  size  values 

(FNLL  >  0) 

25-32 

F8.0 

FNINCR 

Sample  size  value  increment 

(FNINCR  >  0) 

33-40 

F8.0 

FNUL 

Upper  limit  of  sample  size  values 

(FNUL  >  0) 

41-48 

F8.0 

ALPHALL 

Lower  limit  of  Type  I  error  values 

(0.0  <  ALPHALL  <  1.0) 

49-56 

F8.0 

AINCR 

Type  I  error  value  increment 

(AINCR  >  0.0) 

57-64 

F8.0 

ALPHAUL 

Upper  limit  of  Type  I  error  values 
(0.0  <  ALPHAUL  <  1.0) 

65-72 

F8.0 

TYPE  Test  type,  i.e., 

=1,  for  one-sided  test 

(hypothesis  (2)  or  (3)) 

=2,  for  two-sided  test 
(hypothesis  (1)) 

One  record  type  2  is  required  for  each  case  to  be  processed. 

73-80 

F8.0 
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COMMENTS 

Since  the  integrals  in  the  expressions  for  power  cannot  be  evaluated  in  closed  form, 
they  must  be  evaluated  numerically.  For  each  of  the  three  hypotheses  it  can  be  shown  that 
the  power  expression  can  be  transformed  to  an  expression  involving  the  incomplete  beta 
function  Ix(a,b ),  where 


x 

,  ,  U  _  +b)  C  a-\f,  .sb- 1 

Ixi  ,b)  r(a)r(b)J  (  }  d 


0<x<l 
a,b  >0 


Very  good  numerical  routines  are  available  for  evaluating  Ix(a,b)  such  as  BRATIO  in 
MATHLIB.  Transforming  the  power  expressions  results  in  two  expressions  for  power,  one 
for  the  one-sided  test  (hypothesis  (2)  or  (3))  and  one  for  the  two-sided  test  (hypothesis  (1)). 
These  expressions  are  given  below: 


Power  =  0.5[e^8f/2  £  ^~L_x(v/2,i  +  0.5) 

i=or(t  +  1) 


2  i  +  1/2 

„-<5 fn  v  (8  /2) 

i50r(i  +  3/2)/l-,( /2,  1)1 


Power  =  e"^ 12  £  -/1_Jt(v/2,/  +0.5) 

i  =  0 1  (J  +  11 


Note  that  Ix_x(a,b)  =  1  -Ix(b,a).  Program  T1POW  utilizes  the  above  expressions  with 
v  =  n  -  1  and  5'  =  bVn  to  compute  the  power  of  the  one-sample  t  test. 
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T2POW 


PURPOSE 


Program  T2POW  (T  Power.  2  Populations)  evaluates  the  power  function  for  tests  of 
hypotheses  regarding  the  equality  of  the  means  of  two  normal  populations.  The  tests  are 
based  on  two  independent  random  samples  of  sizes  n{  and  n2,  respectively,  from  two 

normal  populations  having  unknown  but  equal  variances.  Let  X,  and  X2  be  normal  random 
variables  with  unknown  means  (ij  and  pj  and  unknown  common  variance  or2.  Then  the 
sample  means  Xx  and  X2  are  normal  random  variables  with  unknown  means  Pi  and  m  and 
unknown  variances  ol!nl  and  cr2/^,  respectively,  where  Xt  =  l!'/=iXij/n1.  These  tests  are 

commonly  referred  to  as  two-sample  t  tests  or  “pooled”  t  tests  (see  Reference  1).  The  three 
relevant  hypotheses  and  their  corresponding  test  statistics,  critical  regions,  and  power 
functions  are  given  below.  In  each  case, 

a  =  size  of  the  critical  region 

=  Prob(committing  a  Type  I  error) 

and  the  test  statistic 

T0  =  r0  =  (X , -X2)  /  Sp-J\inl  +  \/n2 


where  S2  is  the  “pooled”  estimator  of  the  common  variance  a2  and  is  given  by 


•*;= 


i(x1;-Xj)2+  i(x2j-x2y 

U* 1  /  =  ! 


77  ,2 


/(«!  +  «2~2) 


The  probability  density  function  of  T0  depends  on  the  true  value  of  the  difference 
d  =  p,  -  p 2  (d  =  d0  or  d  =  d^  *  d0).  d0  =  0  is  the  hypothesized  value  of  d.  For  d-d, , 
this  density  will  be  denoted  by  g(t0  \  d{).  Define  5  =  |  d;|  /a.  Then  g(t0 1  d)  is  a  non-central 


t  distribution  with  parameters  v  (degrees  of  freedom)  and  §'  =  8/Vl/fli  + 1/«2  (non¬ 
centrality  parameter)  and  is  given  by 


g(*oK)  = 


v 


-  WT  j 

n  ,i 

f  2  i 

\  *  j  v  r 

(ip  r!  ^ 

-(v  +  r  +  l) 

lv  +  r°  J 

Here  v  =  +  n2  -  2.  When  d,  =  do  =  0,  5'  =  0  and  g(t0  |  d,)  reduces  to  the  central  t  distri¬ 

bution  with  n,  +  n2  -  2  degrees  of  freedom.  Otherwise,  8'  is  non-zero.  The  critical  region 
for  the  test  is  based  on  g(t0  j  d0)  and  the  power  function  is  expressed  in  terms  of  g(t0 1  dt). 
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********************* 

Hypothesis  ( 1 ):  H0:  pt,  =  p2 

Hp  Mi  *  p 2- 

Critical  Region;  Reject  H0  if  I70l  >  tv  x_al2 

where  K,\-a,i  is  the  100(1  -  oe/2  )th  percentage  point  of  the  central  t 
distribution  with  v  degrees  of  freedom.  Hence, 

cU2=5tvl_a/28(to\do)dto- 

ft  f  °° 

Power  Function;  U(dunun2)  =J^,a/2g(r0 1  djdto+i^  l  aJ2g(h  I  dx)dtQ . 


********************* 


Hypothesis  (2):  H0:  p,  <  P2 

H 

Critical  region:  Reject  H0  if  T0  >  rv  ,_a 
Hence, 

f” 

a“Jfv  ,_a&(*0  1  4>)u‘0  • 


Power  Function; 


U(dltnltn 


1  -a 


g{to\dx)dt0 . 


********************* 

Hypothesis  (3):  H0:  p,  >  p2 

H,:  Pi  <  P2 

Critical  Region;  Reject  Ho  if  T0  <  rv  a 
Hence, 

a=l  v,ag(t0\d0)dt0. 
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Power  Function: 


U.(dltnt,n2)=i  '>,a  g{l0\ dx)dt0  ■ 


FEATURES 

T2POW  allows  the  user  to  process  up  to  25  cases  in  a  single  computer  run.  Each  case 
corresponds  to  a  single  range  of  the  user-specified  Type  I  errors.  T2POW  provides  the  user 
with  the  option  of  either  computing  a  single  power  for  a  specified  sample  size  pair  or 
printing  a  table  of  power  values.  If  the  single  power  option  is  chosen,  the  user  must  also 
specify  the  type  of  test  (i.e.,  one-  or  two-sided),  a  sample  size  pair,  and  a  value  of  delta, 
where  delta  is  the  number  of  standard  deviations  that  the  difference  in  true  means  is  from 
the  hypothesized  difference  of  zero  in  absolute  value.  Hypotheses  (2)  and  (3)  are  one-sided 
tests  while  hypothesis  (1)  is  a  two-sided  test.  If  the  power  table  option  is  chosen,  the  user 
instead  specifies  the  test  type,  and  a  range  of  values  for  the  first  population  sample  size, 
Type  I  error,  and  delta.  In  addition,  a  multiplicative  factor  must  be  provided  which  relates 
the  second  population  sample  size  to  the  first.  Under  the  power  table  option  each  page  of 
T2POW  output  consists  of  the  following: 

*  A  power  table  for  the  specified  test  type  and  Type  I  error  for  up  to  40  values  of 
the  pair  («,,  n2)  and  up  to  9  values  of  delta. 


REFERENCES 

1.  Bowker,  A.  H.  and  Lieberman,  G.  J.  (1972),  Engineering  Statistics,  Second  Edition, 
Prentice-Hall,  Inc.,  pp.  235  -  240. 

2.  Brownlee,  K.  A.  (1965),  Statistical  Theory  and  Methodology  in  Science  and  Engi¬ 
neering,  Second  Edition,  John  Wiley  and  Sons,  Inc.,  pp.  297  -  299. 

3.  DiDonato,  A.  R.  (1988),  unpublished  notes  and  computer  code  for  numerically 
evaluating  a  specialized  infinite  sum  occurring  in  the  expression  for  the  power  of  the 
t  test. 

4.  Freund,  John  E.  (1962),  Mathematical  Statistics,  Prentice-Hall,  Inc.,  pp.  267  -  269. 

5.  Thomas,  M.  A.  (1988),  unpublished  notes  deriving  an  expression  for  the  power  of  the 
t  test  in  terms  of  the  incomplete  beta  function. 

6.  Wine,  R.  L.  (1964),  Statistics  for  Scientists  and  Engineers,  Prentice-Hall,  Inc.,  pp. 
254  -  264. 
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INPUT  GUIDE 


The  user  must  create  a  data  file  containing  the  following  record  types.  The  first  record 
type  specifies  the  number  of  cases  (NCASES)  to  be  run.  The  second  record  type  specifies 
the  power  table  computation  option.  If  the  single  power  computation  option  is  chosen,  the 
third  record  type  specifies  the  values  of  delta,  the  first  population  sample  size,  the  second 
population  sample  size,  the  Type  I  error,  and  the  test  type.  If  the  power  table  option  is 
chosen,  the  third  record  type  is  different  and  a  fourth  record  type  is  also  required.  In  this 
case,  the  third  record  type  specifies  the  limits  for  delta,  the  first  population  sample  size,  and 
Type  I  error  together  with  the  test  type.  The  fourth  record  type  specifies  the  value  of  the 
multiplicative  factor  relating  the  second  population  sample  size  to  the  first.  If  NCASES  is 
greater  than  one,  record  types  2,  3  (3A  or  3B),  and,  if  necessary,  4  must  be  repeated  for 
each  case. 


More  specifically  the  required  data  file  is  constructed  as  follows: 


Record 

■Type. 

1 

2 


3A 


Variable 

Description 

Columns 

Format 

NCASES 

Number  of  cases 

1-5 

15 

IOPTION 

Power  computation  option 
=  0,  for  power  table  output 

=  1 ,  for  single  power  value 

1-5 

15 

The  following  record  type  is  included  only  if  IOPTION 

=  1. 

D(l) 

Delta  value 

1  -  8 

F8.0 

(D(l)  >  0.0) 

FN(1) 

1st  population  sample  size 

9-  16 

F8.0 

(FN(1)  >  0) 

FN2 

2nd  population  sample  size 

17-24 

F8.0 

(FN2  >  0) 

ALPH(l) 

Type  I  error 

(0.0  <ALPH(1)<  1.00) 

25-32 

F8.0 

TYPE 

Test  type,  i.e., 

33-40 

F8.0 

=1,  for  one-sided  test 

(hypothesis  (2)  or  (3)) 
=2,  for  two-sided  test 
(hypothesis  (1)) 
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3B 


The  following  record  type  is  included  only  if  IOPTION  =  0. 


DELTALL 

Lower  limit  of  delta  values 

(DELTALL  >  0.0) 

1-8 

F8.0 

DINCR 

Delta  value  increment 

(DINCR  >  0.0) 

9-  16 

F8.0 

DELTAUL 

Upper  limit  of  delta  values 

(DELTAUL  >0.0) 

17-24 

F8.0 

FNLL 

Lower  limit  of  first  population  sample  size- 
values 

(FNLL  >  0) 

25-32 

F8.0 

FNINCR 

First  population  sample  size  value  increment 

(FINCR  >  0) 

33-40 

F8.0 

FNUL 

Upper  limit  of  first  population  sample  size 
values 

(FNUL  >  0) 

41-48 

F8.0 

ALPHALL 

Lower  limit  of  Type  I  error  values 

(0.0  <  ALPHALL  <  1.0) 

49-56 

F8.0 

AINCR 

Type  I  error  value  increment 

(AINCR  >  0.0) 

57-64 

F8.0 

ALPHAUL 

Upper  limit  of  Type  I  error  values 

(0.0  <  ALPHAUL  <  1.0) 

65-72 

F8.0 

TYPE 

Test  type,  i.e., 

=1,  for  one-sided  test 

(hypothesis  (2)  or  (3)) 

=2,  for  two-sided  test 

73-80 

F8.0 

(hypothesis  (1)) 


Record  type  4  is  included  only  if  IOPTION  =  0. 
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4  FACTOR  Sample  size  multiplicative  factor.  Satisfies  1  -  20 
the  relationship  n2  =  FACTOR*^. 

(FACTOR  >  0.0) 


F8.0 


One  record  type  2  and  3  (3A  or  3B)  is  required  for  each  case  to  be  processed. 
One  record  type  4  is  required  for  each  case  in  which  IOPTION  =  0. 


COMMENTS 


Since  the  integrals  in  the  expressions  for  power  cannot  be  evaluated  in  closed  form, 
they  must  be  evaluated  numerically.  For  each  of  the  three  hypotheses  it  can  be  shown  that 
the  power  expression  can  be  transformed  to  an  expression  involving  the  incomplete  beta 
function  Ix(a,b),  where 


Ix(a,b) 


T(a+b) 

rW(6) 


0<x  <  1 
a,b  >  0 


Very  good  numerical  routines  are  available  for  evaluating  Ix(a,b)  such  as  BRATIO  in 
MATHLIB.  Transforming  the  power  expressions  results  in  two  expressions  for  power,  one 
for  the  one-sided  test  (hypothesis  (2)  or  (3))  and  one  for  the  two-sided  test  (hypothesis  (1)). 
These  expressions  are  given  below: 


(l)  One- $ided  test 


Power  =  0.5[e'<8f/2  £  (v/2ti  +0.5) 

.  =0  1  ( l  +  1) 

2  i  + 1/2 

+e"8f,!I.F(r?W/-(v,2’i+1)1 


(2)  Two-sided  test 


Power  =  e^  2  £  j~^-/1_je(v/2,i+0.5) 

i  =  o  1  (i  + 1) 


Note  that  Ix_x{a,b)~  1  -Ix(b,a).  Program  T2POW  utilizes  the  above  expressions  with 
v  =  nx  +  n2  -  2  and  5'  =  5/  yj\/nx  +  l/n2  to  compute  the  power  of  the  pooled  t  test. 
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CHIVPOW 


PURPOSE 

Program  CHIVPOW  (Power  of  the  Chi-Square  Test  on  a  Single  Variance)  evaluates 
the  power  function  for  tests  of  hypotheses  regarding  the  variance  of  a  normal  population. 
The  tests  are  based  on  a  random  sample  of  size  n  from  a  normal  population  having  unknown 
variance  o2.  Let  X  be  a  normal  random  variable  with  unknown  mean  |i  and  unknown 
variance  cr2.  Then  the  sample  mean  X  is  a  normal  random  variable  with  unknown  mean  p 
and  unknown  variance  a2//!,  where  X  =  L?=1X;//i.  These  tests  are  commonly  referred  to 

as  x2  tests  on  a  single  variance  (see  References  1  and  2).  The  three  relevant  hypotheses 
and  their  corresponding  test  statistics,  critical  regions,  and  power  functions  are  given  below. 
In  each  case, 

a  =  size  of  the  critical  region 

=  Prob(committing  a  Type  I  error) 
and  the  test  statistic 


7o  =  Xo=  i(x,-x)2/^ 

i  =  1 

where  is  the  hypothesized  value  of  the  true  variance  O2.  The  probability  density  function 
of  T0  depends  on  the  true  value  of  the  parameter  a  (o0  or  a,  *  a0).  For  a  =  a0,  this  density 
will  be  denoted  by  g(t0 1  o0).  g(t0  |  a0)  is  a  chi-square  distribution  with  v  (degrees  of  free¬ 
dom)  and  is  given  by 


S(Jbl<*o)=; 


1 


2  T(v/2) 


rrV'°/2 

*0  c  J 


t0>0 


Here  v  =  n  -  1.  Define  X  =  o,/a0  and  c  =  1/X.  X  ranges  from  0  to  °°  and  has  a  value  of  1 
when  the  hypothesized  value  of  the  population  standard  deviation  equals  the  true  value  of 
that  parameter.  The  critical  region  for  the  test  is  based  on  g  ( t0  \  a0)  and  the  power  function 

is  expressed  in  terms  of  X  and  g(t0  \  a0). 


*****************;£*** 


Hypothesis  (1):  a  =  a0 

H,:  o*c0. 
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Critical  Region: 


Power  Function: 


$  ;fe  jjc  )je  jjc 

Hypothesis  (2): 
Critical  region: 


Power  Function: 


*  *  *  * 

Hypothesis  (3): 

Critical  Region: 


Power  Function: 


Reject  H0  if  T0  <  Xv,a/2  or  if  T0  >  xl.\-an 

where  Xm-o/2  is  the  100(1  -  o^2)th  percentage  point  of  the  chi-square 
distribution  with  v  degrees  of  freedom.  Hence, 

a^2  =  J  2  g(t0\c0)dt0. 

*v,  1  -  a/2 

2  2 

Tl(Oi,n)  =/  Xv,a/2g(t0  I  OQ)dt0+l  2  2  gOo  I  GoWo  ■ 

0  c  *v, l-a/2 

**************** 

Hq:  o  <  a0 

H,:  oa0. 

Reject  Hq  if  T0  >Xv.i-« 

Hence, 

2  g(t0\a0)dt0. 

>tv,  l-a 

>o 

2  2  g(to  I  o0)dr0 . 

*v,  1  -a 

****  *******  ***** 

H0:  a  >  a0 
H,:  a<a0. 

Reject  Hq  if  T0  <  x5,a 
Hence, 

a=J0Xv'a<?(fol  o0)rfr0. 

fcV 

n(a„/j)=Jo  ,ag(t0l  a0)dt0 . 


n(o„n)=l 
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FEATURES 

CHIVPOW  allows  the  user  to  process  up  to  25  cases  in  a  single  computer  run.  Each 
case  corresponds  to  a  single  range  of  the  user-specified  Type  I  errors.  In  CHIVPOW  the 
user  specifies  the  type  of  test  (i.e.,  one-  or  two-sided),  and  a  range  of  values  for  sample  size, 
Type  I  error,  and  lambda,  where  lambda  is  the  ratio  of  the  true  population  standard  devi¬ 
ation  to  its  hypothesized  value.  Hypotheses  (2)  and  (3)  are  one-sided  tests  while  hypothesis 
(1)  is  a  two-sided  test.  Each  page  of  CHIVPOW  output  consists  of  the  following: 

*  A  power  table  for  the  specified  test  type  and  Type  I  error  for  up  to  40  values  of  n 
and  up  to  9  values  of  lambda. 


REFERENCES 

1.  Bowker,  A.  H.  and  Lieberman,  G.  J.  (1972),  Engineering  Statistics,  Second  Edition, 
Prentice-Hall,  Inc.,  pp.  207  -  217. 

2.  Brownlee,  K.  A.  (1965),  Statistical  Theory  and  Methodology  in  Science  and  Engi¬ 
neering,  Second  Edition,  John  Wiley  and  Sons,  Inc.,  pp.  282  -  285. 

3.  Freund,  John  E.  (1962),  Mathematical  Statistics,  Prentice-Hall,  Inc.,  pp.  271  -  272. 


INPUT  GUIDE 

The  user  must  create  a  data  file  containing  the  following  record  types.  The  first  record 
type  specifies  the  number  of  cases  (NCASES)  to  be  run.  The  second  record  type  specifies 
the  limits  for  lambda,  sample  size,  and  Type  I  error  together  with  the  test  type.  If  NCASES 
is  greater  than  one,  record  type  2  must  be  repeated  for  each  case. 

More  specifically  the  required  data  file  is  constructed  as  follows: 


Record 

Tvpe 

Variable 

Description 

Columns 

Format 

1 

NCASES 

Number  of  cases 

1  -  5 

15 

2 

DLAMBLL 

Lower  limit  of  lambda  values 

(DLAMBLL  >  0.0) 

1  -  8 

F8.0 

DLINCR 

Lambda  value  increment 

(DLINCR  >  0.0) 

9-  16 

F8.0 

DLAMBUL 

Upper  limit  of  lambda  values 
(DLAMBUL  >  0.0) 

17-24 

F8.0 
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FNLL 

Lower  limit  of  sample  size  values 

(FNLL  >  0) 

25-32 

F8.0 

FNINCR 

Sample  size  value  increment 

(FNINCR  >  0) 

33-40 

F8.0 

FNUL 

Upper  limit  of  sample  size  values 

(FNUL  >  0) 

41-48 

F8.0 

ALPHALL 

Lower  limit  of  Type  I  error  values 

(0.0  <  ALPHALL  <  1.0) 

49-56 

F8.0 

AINCR 

Type  I  error  value  increment 

(AINCR  >  0.0) 

57-64 

F8.0 

ALPHAUL 

Upper  limit  of  Type  I  error  values 

(0.0  <  ALPHAUL  <  1.0) 

65-72 

F8.0 

TYPE 

Test  type,  i.e., 

=1,  for  two-sided  test 
(hypothesis  (1)) 

=2,  for  one-sided,  right-tailed  test 
(hypothesis  (2)) 

=3,  for  one-sided,  left-tailed  test 

73-80 

F8.0 

(hypothesis  (3)) 

One  record  type  2  is  required  for  each  case  to  be  processed. 
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FVARPOW 


PURPOSE 

Program  FVARPOW  (Power  of  the  E  Test  on  Two  Variances)  evaluates  the  power 
function  for  tests  of  hypotheses  regarding  the  equality  of  the  variances  of  two  normal 
populations.  The  tests  are  based  on  two  independent  random  samples  of  sizes  nx  and 
respectively,  from  two  normal  populations  having  unknown  variances  crj  and  of.  Let  X} 
and  X2  be  normal  random  variables  with  unknown  means  |i,  and  m  and  unknown  variances 
crj  and  of.  Then  the  sample  means  Xx  and  X2  are  normal  random  variables  with  unknown 
means  p,  and  and  unknown  variances  d\l  nx  and  df/fy,  respectively,  where 

X,  =  £"‘=1 Xij/rii.  These  tests  are  commonly  referred  to  as  F  tests  for  the  equality  of  two 

variances  (see  References  1  and  2).  The  three  relevant  hypotheses  and  their  corresponding 
test  statistics,  critical  regions,  and  power  functions  are  given  below.  In  each  case, 

a  =  size  of  the  critical  region 
=  Prob(committing  a  Type  I  error) 
and  the  test  statistic 

T0~F 0  =  Sf/Sf 


where  S,2  is  the  unbiases  estimator  of  the  variance  of  and  is  given  by 


Sf  =  S(Xs-X.)2/(n,-l). 

;'  =  i 


Define  X  =  oxlo2  and  c  =  1/X.  The  probability  density  function  of  T0  depends  on  the  true 
value  of  the  parameter  X  (X  =  Xo  or  X  =  Xt  Xq).  Xq  =  1  is  the  hypothesized  value  of  X. 
For  X  =  Xfl ,  this  density  will  be  denoted  by  g  ( t0 1  X,,).  g  (t0 1  Xq)  is  an  F  distribution  with 
parameters  v,  (numerator  degrees  of  freedom)  and  v2  (denominator  degrees  of  freedom)  and 
is  given  by 


£('ol^o)  = 


n(v,+viy2](v,/v2)',‘'2 

nv,/2)nv2/2) 


v./2-l 

to 


(l+Vpo/Vz) 


(v,+v2y2 


r0>0 


Here  v1  =  n,  -  1  and  v2  =  n2- 1.  The  critical  region  for  the  test  is  based  on  g  ( t0 1  Xq)  and  the 
power  function  is  expressed  in  terms  of  Xj  and  g(t0 1  Xo). 
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*  *  *  *  * 

Hypothesis  t^ij. 


Critical  Region: 


Power  Function: 


5k  sj<  *  5jc  % 

Hypothesis  (2): 


Critical  region: 


Power  Function: 


*  *  *  * 

Hypothesis  (3): 


**************** 

Hq.-  <j)  —  (J2 
Hp  Oj  ^  <J2. 

Reject  Ho  if  T0  <  FVi_V2a,2 

or  if  T0  >  F Vj.v^i -aj2  where  Fv^_aJ2  is  the  100(1  -  a/2)th  percentage 
point  of  the  F  distribution  with  v,  and  v2  degrees  of  freedom.  Hence, 

2p 

tc  tv.,v-,ctJ2 

nun2)  =J  g(t0 1  X^dto+j  2p  g(t0 1  \)dtQ . 

0  c  -a/2 

**************** 


Hq."  <t,  <  o2 
Ht:  a,  >a2 

Reject  Ho  if  T0  >FViiVji,_a 
Hence, 

a=Ipv  v  ,  a*('o  I  K)dt0  . 

v,,v2.i-a 

na„  n2)  =j  2p  g  (to  I  K)dt0  • 

v,,v2,l-a 


**************** 


H0:  a,  >  o2 
H,:  a,  <a2 
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Critical  Region:  Reject  H0  if  T0  <  FVi,Vj. a 
Hence, 

F 

[  v.,v,,a 

a=JQ  g(t0\Xv)dt0. 

c2F 

{  vl,v2,a 

Power  Function:  n(X.l,n„n2)  =JQ  g(t0  I  ^o)<*o  • 


FEATURES 

FVARPOW  allows  the  user  to  process  up  to  25  cases  in  a  single  computer  run.  Each 
case  corresponds  to  a  single  range  of  the  user- specified  Type  I  errors.  FVARPOW  provides 
the  user  with  the  option  of  either  computing  a  single  power  for  a  specified  sample  size  pair 
or  printing  a  table  of  power  values.  If  the  single  power  option  is  chosen,  the  user  must  also 
specify  the  type  of  test  (i.e.,  one-  or  two-sided),  a  sample  size  pair,  and  a  value  of  lambda, 
where  lambda  is  the  ratio  of  the  true  numerator  population  standard  deviation  to  the  true 
denominator  population  standard  deviation.  Hypotheses  (2)  and  (3)  are  one-sided  tests 
while  hypothesis  (1)  is  a  two-sided  test.  If  the  power  table  option  is  chosen,  the  user  instead 
specifies  the  test  type,  and  a  range  of  values  for  the  numerator  population  sample  size.  Type 
I  error,  and  lambda.  In  addition,  a  multiplicative  factor  must  be  provided  which  relates  the 
denominator  population  sample  size  to  the  numerator  population  sample  size.  Under  the 
power  table  option  each  page  of  FVARPOW  output  consists  of  the  following: 

*  A  power  table  for  the  specified  test  type  and  Type  I  error  for  up  to  40  values  of 
the  pair  ( n{,n2 )  and  up  to  9  values  of  lambda. 


REFERENCES 

1.  Bowker,  A.  H.  and  Lieberman,  G.  J.  (1972),  Engineering  Statistics,  Second  Edition, 
Prentice-Hall,  Inc.,  pp.  254  -  265. 

2.  Brownlee,  K.  A.  (1965),  Statistical  Theory  and  Methodology  in  Science  and  Engi¬ 
neering,  Second  Edition,  John  Wiley  and  Sons,  Inc.,  pp.  285  -  288. 

3.  Freund,  John  E.  (1962),  Mathematical  Statistics,  Prentice-Hall,  Inc.,  pp.  273  -  274. 
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INPUT  GUIDE 

The  user  must  create  a  data  fils  containing  the  following  record  types.  The  first  record 
type  specifies  the  number  of  cases  (NCASES)  to  be  run.  The  second  record  type  specifies 
the  power  table  computation  option.  If  the  single  power  computation  option  is  chosen,  the 
third  record  type  specifies  the  values  of  lambda,  the  numerator  population  sample  size,  the 
denominator  population  sample  size,  the  Type  I  error,  and  the  test  type.  If  the  power  table 
option  is  chosen,  the  third  record  type  is  different  and  a  fourth  record  type  is  also  required. 
In  this  case,  the  third  record  type  specifies  the  limits  for  lambda,  the  numerator  population 
sample  size,  and  Type  I  error  together  with  the  test  type.  The  fourth  record  type  specifies 
the  value  of  the  multiplicative  factor  relating  the  denominator  population  sample  size  to  the 
numerator  population  sample  size.  If  NCASES  is  greater  than  one,  record  types  2,  ?  (3A 
or  3B),  and,  if  necessary,  4  must  be  repeated  for  each  case. 

More  specifically  the  required  data  file  is  constructed  as  follows: 

Record 


Tvpe 

Variable 

Description 

Columns 

Format 

1 

NCASES 

Number  of  cases 

1-5 

15 

2 

IOPTTON 

Power  computation  option 
=  0,  for  power  table  output 
=  1,  for  single  power  value 

1-5 

15 

The  following  record  type  is  included  only  if  IOPTION 

=  1. 

3A 

DL(1) 

Lambda  value 

1-8 

F8.0 

(DL(1)  >  0.0) 

FN(1) 

Numerator  population  sample  size 

9-16 

F8.0 

(FN(1)  >  0) 

FDEN 

Denominator  population  sample  size 

17-24 

F8.0 

(FDEN  >  0) 

ALPH(l) 

Type  I  error 

25-32 

F8.0 

(0.0  <  ALPH(1)<  1.00) 

TYPE 

Test  type,  i.e., 

=1,  for  one-sided  test 

(hypothesis  (2)  oi  (3)) 

=2,  for  two-sided  test 
(hypothesis  (1)) 

33-40 

F8.0 
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3B 


The  following  record  type  is  included  only  if  IOPTION  =  0. 


DLAMBLL 

Lower  limit  of  lambda  values 

(DLAMBLL  >  0.0) 

1  -  8 

F8.0 

DLINCR 

Lambda  value  increment 

(DLINCR  >  0.0) 

9-  16 

F8.0 

DLAMBUL 

Upper  limit  of  delta  values 

(DLAMBUL  >  0.0) 

17-24 

F8.0 

FNLL 

Lower  limit  of  numerator  population  sample 
size  values 

(FNLL  >  0) 

25-32 

F8.0 

FNINCR 

Numerator  population  sample  size  value  in¬ 
crement 

(FNINCR  >  0) 

33-40 

F8.0 

FNUL 

Upper  limit  of  numerator  population  sample 
size  values 

(FNUL  >  0) 

41-48 

F8.0 

ALPHALL 

Lower  limit  of  Type  I  error  values 

(0.0  <  ALPHALL  <  1.0) 

49  -  56 

F8.0 

AINCR 

Type  I  error  value  increment 

(AINCR  >0.0) 

57-64 

F8.0 

ALPHAUL 

Upper  limit  of  Type  I  error  values 

(0.0  <  ALPHAUL  <  1.0) 

65-72 

F8.0 

TYPE 

Test  type,  i.e., 

=1,  for  two-sided  test 
(hypothesis  (1)) 

=2,  for  one-sided,  right-tailed  test 
(hypothesis  (2)) 

=3,  for  one-sided,  left-tailed  test 
(hypothesis  (3)) 

73-80 

F8.0 

Record  type  4  is  included  only  if  IOPTION  =  0. 
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FACTOR  Sample  size  multiplicative  factor.  Satisfies 
the  relationship  n2  =  FACTOR*n,. 

(FACTOR  >  0.0) 


1  -20 


One  record  type  2  and  3  (3A  or  3B)  is  required  for  each  case  to  be  processed. 
One  record  type  4  is  required  for  each  case  in  which  IOPTION  =  0. 
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FEMPOW 


PURPOSE 

Program  FEMPOW  (Fixed  Effects  Model  Power)  computes  the  power  of  the  test  for 
a  fixed  effects  model  one-way  analysis  of  variance  (anova).  The  one-way  anova  fixed 
effects  model  is  written 

r^p+4+4, 

with  i  =  1,2,  ...,  k  (number  of  treatments) 

and  j  =  1,2,  ...,  n ,  (number  of  observations  for  treatment  i ). 

The  A,’s  represent  effects  of  k  fixed  or  predetermined  treatments  and  are  subject  to  the 
constraint 

14=0. 

;=i 

The  specific  k  treatment  effects  are  the  object  of  the  analyst’s  interest  for  the  fixed  effects 
model.  The  £y’s  represent  random  error  components  and  are  assumed  normally  and  inde¬ 
pendently  distributed  with  mean  zero  and  variance  a2.  Although  the  one-way  anova  model 
accommodates  different  numbers  of  observations  per  treatment,  the  power  computations  in 
FEMPOW  are  computed  only  for  the  case  where  «,  =  rt,  i.e.,  the  number  of  observations 
per  treatment  is  n.  FEMPOW  was  formulated  as  a  planning  aid  for  the  analyst  who  wishes 
to  compare  the  effect  of  number  of  treatments  and  sample  size  per  treatment  upon  power. 

The  null  hypothesis  of  interest  is  that  the  treatment  effect  is  equal  to  zero  for  each 
treatment.  For  the  current  model  this  is  expressed  as 

Hq:  A,  =  A2  =  ■■■  =Ak  =  0 

versus  the  alternative  hypothesis 

H, :  At  least  one  A,  is  not  equal  to  zero. 

Equivalent  expressions  can  be  obtained  by  defining  the  ith  treatment  mean  as  p;  =  p  +  A,. 
Then  the  null  hypothesis  becomes 

Ho:  P,  =  P2=---=H*  =  0 
and  the  alternative  hypothesis  will  be 

H, :  At  least  two  of  the  means  are  not  equal. 

The  test  statistic  is 

_  Treatment  Mean  Square  from  Anova 
Error  Mean  Square  from  Anova 
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/i  i(y,  -r  f  x  x  (k  -  y,  f 

■=i _ ,  ■=>;•=»  _ 

k  -  1  &(n-l) 

which  follows  an  F  distribution  with  (k  -  1)  and  &(«  -  1)  degrees  of  freedom  if  H0  is  true. 
For  a  specified  value  of  a,  the  probability  of  Type  I  error,  one  rejects  H0  if 

F0  >  F(k  -  1,  k(n  -  1),  1  -  a)  =  100(1  -  a)th  percentile  of  the  F  distribution 
with  k  -  1  and  k(n  -  1)  degrees  of  freedom. 

In  the  fixed  effects  model  one-way  anova,  power  measures  the  ability  of  the  test  to  detect 
differences  in  the  treatment  effects  and  is  determined  by  calculating  the  probability  of  re¬ 
jecting  Hq  when  is  true.  The  power  function  is  expressed  as 

(  k 
Il(?U,n)  =  Prob  F(k,k  -  \,k(n  -  1))  >  F(k  -  \,k(n  -  1),  1  -a)  |  lA?*  0 
v  •  = 1  ) 

where  F(k,k  -  \,k{n  -  \))  represents  the  non-central  F  distribution  with  non-centrality 
parameter 

A  =  n  X^412/(2o2) 

i  =  1 

and  degrees  of  freedom,  k  -  1  and  k{n-  1).  However,  without  the  inconvenience  of  as¬ 
signing  specific  values  to  each  -4,,  it  is  difficult  to  obtain  meaningful  power  computations 
from  the  above  formulation.  Reference  1  presents  expressions  which  allow  upper  and 
lower  bounds  of  A.  to  be  determined  as  a  function  of  w,  the  range  of  treatment  means.  Using 
these  bounds  minimum  and  maximum  values  of  power  can  then  be  computed  for  specified 
values  of  w,  k,  n,  o2  and  a.  The  analyst  may  easily  evaluate  the  sensitivity  of  the  test  for 
the  fixed  effects  model  by  exercising  FEMPOW  over  appropriate  sets  of  values  for  the 
given  parameters.  Additional  discussion  of  the  power  function  parameters  is  given  in  the 
COMMENTS  section. 

References  1  and  2  contain  additional  information  about  the  one-way  analysis  of 
variance  for  the  fixed  effects  model.  Both  references  contain  specific  discussions  as  well 
as  examples  of  the  computation  of  power. 

FEATURES 

The  output  from  FEMPOW  is  the  power  for  the  one-way  fixed  effects  anova  model. 
Power  is  be  computed  and  printed  as  described  below: 

*  A  minimum  and  maximum  value  of  power  is  computed  for  each  specified 
combination  of  the  power  function  parameters  w,  k,  n,  o2  and  a. 
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INPUT  GUIDE 

The  specifications  of  the  user-created  input  file  are  given  below.  The  input  file  must 
contain  all  six  record  types. 

Record 


Type  Variable 

Description 

Columns 

Format 

1  NM 

Number  of  n  values  to  be  processed  where  n 
is  the  number  of  observations  on  each  of  the 
k  treatments. 

(NM  <  10) 

1-5 

15 

NK 

Number  of  k  values  to  be  processed  where  k 
is  the  number  of  treatments  being  compared 
in  the  anova. 

(NK  <  10) 

6-10 

15 

NW 

Number  of  w  values  to  be  processed  where 
w  is  the  range  of  the  means  of  the  k  treat¬ 
ment  populations. 

(NW  <  7) 

11-15 

15 

NSIG 

Number  of  sigma  (o)  values  to  be  processed 
where  sigma  is  the  standard  deviation  of  the 
random  error  component  in  the  fixed  effects 
anova  model. 

(NSIG  <  7) 

16-20 

15 

NALPHA 

Number  of  alpha  a  values  to  be  processed 
where  alpha  is  the  probabilhy  of  a  Type  I 
error. 

21-25 

15 

(NALPHA  <  7) 
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2 

NS(I) 

Enter  NM  (<  10)  values  of  n,  the  number  of 
observation  per  each  of  the  k  treatments. 

1-50 

1015 

3 

K(I) 

Enter  NK  (<  10)  values  of  k,  the  number  of 
treatments  being  compared  in  the  anova. 

1-50 

1015 

4 

W(I) 

Enter  NW  (<  7)  values  of  w,  the  range  of  the 
means  of  the  treatment  populations. 

1-70 

7F10.4 

5 

SIG(I) 

Enter  NSIG  (<  7)  values  of  a,  the  standard 
deviation  of  the  random  error  component  in 
the  fixed  effects  anova  model. 

See  COMMENTS  section. 

1-70 

7F10.4 

6 

ALPHA(I) 

Enter  NALPHA  (<  7)  values  of  a,  the  prob¬ 
ability  of  a  Type  I  error. 

1-70 

7F10.4 

COMMENTS 

For  cases  where  the  power  function  parameters  k,  n,  a2  and  a  are  fixed,  the  value  of 
power  varies  for  each  set  of  hypothesized  treatment  effects,  A{.  Reference  1  provides  ex¬ 
pressions  which  yield  minimum  and  maximum  values  of  X  based  upon  the  range  of 
hypothesized  treatment  means,  w,  rather  than  the  individual  A,' s.  These  values  of  X  are 

/.(min)  =  nw 2  /  (4a2)  and 

.  [  knw2/(  8a2),  fork  even 

/.(max)  =  (  . 

[(k2  -  1  )nw2/ (Ska2) ,  for  k  odd. 

This  allows  the  analyst  a  simpler  way  to  present  the  alternative  hypothesis,  H„  for  power 
calculations.  Once  values  of  the  other  power  function  parameters  are  specified,  only  a 
value  for  the  range  of  treatment  means  needs  to  be  assigned  rather  than  values  for  each 
mean. 

The  selection  of  values  for  the  parameter  a,  the  standard  deviation  of  the  random  error 
component,  requires  additional  remarks.  If  the  analyst  has  knowledge  about  the  value  of 
this  parameter,  then  he  can  easily  specify  a  realistic  input  value.  In  the  absence  of  such 
information  a  value  would  have  to  be  assumed.  However,  because  X  is  a  function  of  the 
ratio  of  w  to  a,  it  is  possible  to  “standardize”  w  in  units  of  a.  This  can  be  done  by  using 
the  value  1  for  a.  For  this  approach  the  analyst  needs  to  express  w  in  terms  of  its  ratio  to 
a  rather  than  in  absolute  units. 


< 
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REMPOW 


PURPOSE 


Program  REMPOW  (Random  Effects  Model  Power)  computes  the  power  of  the  test 
for  a  random  effects  model  one-way  analysis  of  variance  (anova).  The  one-way  anova 
random  effects  model  is  written 

Ky  =  (J.+/4,  +  £iy 

with  i  =  1,2,  ...,  k  (number  of  treatments) 

and  j  =  1,2, ...,  nt  (number  of  observations  for  treatment  i). 


TheA/s  represent  effects  of  randomly  chosen  treatment  levels  which  are  assumed  normally 
and  independently  distributed  with  mean  zero  and  variance  The  E~  s  which  represent 

random  error  components  are  also  assumed  normally  and  independently  distributed  with 
mean  zero  and  variance  o2.  Although  the  one-way  anova  model  accommodates  different 
numbers  of  observations  per  treatment,  the  power  computations  in  REMPOW  are  computed 
only  for  the  case  where  n,  =  n,  i.e.,  the  number  of  observations  per  treatment  is  n.  REM¬ 
POW  was  formulated  as  a  planning  aid  for  the  analyst  who  wishes  to  compare  the  effect  of 
number  of  treatments  and  sample  size  per  treatment  upon  power. 

The  null  hypothesis  of  interest  is  that  the  variability  for  the  population  of  treatment 
effects  is  zero.  For  the  current  model  this  is  expressed  as 


versus  the  alternative  hypothesis 

H,:  ^>0. 


The  test  statistic  is 


Treatment  Mean  Square  from  Anova 
Error  Mean  Square  from  Anova 

n  1(7,— F..)2  I  i(7;-F,,)2 

■  =  i _  ,  i  =  i ;  =  i _ 

k  -  1  k{n-  1) 


which  follows  an  F  distribution  with  (k  -  1)  and  k(n  - 1)  degrees  of  freedom  if  Ho  is  true. 
For  a  specified  value  of  a,  the  probability  of  Type  I  error,  one  rejects  H0  if 

F„  >  F(k  -  1 ,  k(n  -  1),  1  -  a)  =  100(1  -  a)th  percentile  of  the  F  distribution 
with  k  -  1  and  k(n-\)  degrees  of  freedom. 
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In  the  random  effects  model  one-way  anova,  power  measures  the  ability  of  the  test  to  detect 
variation  in  the  treatment  effects  and  is  determined  by  calculating  the  probability  of  re¬ 
jecting  H0  when  Hj  is  true.  The  power  function  is  expressed  as 


I"I(X,£,n)  =  Prob 


F  > 


F(k-l,k(n-  D.l-oc), 


<^>0 


where  X  =  (1  +  n(o^  /  a2))12.  However,  X  does  not  provide  an  easily  interpretable  concept 
with  respect  to  the  variance  components.  Two  other  parameters  are  defined  for  this  pro¬ 
gram  each  of  which  possess  this  attribute.  The  first  is  R  =  oA  /  a  which  represents  the  ratio 
of  the  treatment  standard  deviation  to  the  experimental  (random)  error  standard  deviation. 
The  second  parameter  is  P  -  (^/c^  +  o^-oj/c  which  represents  the  proportionate  increase 

in  the  total  standard  deviation  of  Y  which  can  be  attributed  to  variation  between  the  treat¬ 
ments.  At  the  users  option  X  and  thus  n  can  be  expressed  in  terms  of  either  of  the 
parameters,  R  or  P.  Both  R  and  P  provide  convenient  measures  for  relating  aA  to  a  which 
assists  the  analyst  in  evaluating  the  sensitivity  of  the  test  for  a  given  scenario.  The  analyst 
specifies  values  of  R  and/or  P  of  interest  and  an  acceptable  value  of  a.  He  then  chooses  a 
set  of  values  for  the  sample  size  n  and  for  the  number  of  treatments  k  which  are  appropriate 
for  the  available  experimental  resources.  The  resulting  power  computations  enable  him  to 
relate  sample  size  to  power  for  the  specified  values  of  R  and/or  P,  k  and  a.  The  concepts 
of  R,  P  and  X  are  discussed  in  more  detail  in  the  COMMENTS  section. 

References  1  and  2  contain  additional  information  about  the  one-way  analysis  of 
variance  for  the  random  effects  model.  Both  references  contain  specific  discussions  as  well 
as  examples  of  the  computation  of  power. 


FEATURES 

The  output  from  REMPOW  is  the  power  for  the  one-way  random  effects  anova  model. 

Power  can  be  computed  and  printed  as  described  below: 

*  Power  can  be  computed  and  printed  as  a  function  of  R  =  aA  /  a. 

*  Power  can  be  computed  and  printed  as  a  function  of  P  -  fv/o^ToJ-ctj/cJ. 

*  Power  can  be  computed  and  printed  as  a  function  of  R  and/or  P  on  the  same 
computer  run. 

*  For  a  specified  value  of  a  (probability  of  Type  I  error),  the  power  can  be  computed 
and  printed  for  up  to  9  values  of  the  number  of  treatments  k ,  up  to  40  values  of 
the  sample  size  n  and  up  to  40  values  of  either  R  and/or  P  on  the  same  computer 
run. 
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REFERENCES 

1.  Bowker,  A.  H.  and  Lieberman,  G.  J.  (1972),  Engineering  Statistics,  Second  Edition, 
Prentice-Hall,  Inc.,  pp.  377  -  403. 

2.  Walpole,  R.  E.  and  Myers,  R.  H.  (1985),  Probability  and  Statistics  for  Engineers  and 
Scientists,  Third  Edition,  Macmillan  Publishing  Co.,  pp.  454  -  460  and  pp.  463  -  468. 


INPUT  GUIDE 

The  specifications  of  the  user-created  input  file  are  given  below.  The  input  file  must 
contain  all  five  record  types.  Multiple  cases  are  handled  by  repeating  record  types  2 
through  5  for  each  case. 

Record 


Tvpe 

Variable 

Description 

Columns 

Format 

1 

NCASES 

Number  of  cases  to  be  processed.  A  case 
represents  a  set  of  record  types  2  through  5. 
For  a  single  case  the  program  computes 
power  as  a  function  of  R  or  P. 

1-5 

15 

2 

NK 

Number  of  k  values  to  be  processed  where  k 
is  the  number  of  treatments  being  compared 
in  the  anova. 

(NK  <  9) 

1-5 

15 

NN 

Number  of  n  values  to  be  processed  for  this 
case  where  n  is  the  number  of  observations 
on  each  of  the  k  treatments. 

(NN  <  40) 

6-10 

15 

NRAT 

Number  of  R  values  to  be  processed  for  this 
case.  If  NRAT  >  0,  then  NPCT  (columns 
16-20)  must  be  0. 

(NRAT  <  40) 

11-15 

15 

NPCT 

Number  of  P  values  to  be  processed  for  this 
case.  If  NPCT  >  0,  then  NRAT  (columns 

1 1-15)  must  be  0. 

(NPCT  <  40) 

16-20 

15 
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3 

4 

5 


ICASE  =1,  Power  will  be  computed  as  a  function  21-25 
of  R. 

=2,  Power  will  be  computed  as  a  function 
of  P. 

ALPHA  Enter  the  value  of  a,  the  probability  of  26-35 
making  a  Type  I  error.  A  Type  I  error  is 
made  by  wrongly  rejecting  H0:c%  =  0. 


FK(I)  Enter  NK  (<  9)  values  of  k,  the  number  of  1-80 

1=1, NK  treatments  being  compared  in  the  anova. 

FN(I)  Enter  NN  (<40)  values  of  n,  the  number  of  1-80 

1=1, NN  observations  on  each  of  the  k  treatments. 

RATIO(I)  If  ICASE  =  1  (columns  21-25,  record  type  1-80 
I=1,NRAT  2),  enter  NRAT  (<  40)  values  of  R. 

or  If  ICASE  =  2,  enter  NPCT  (<  40)  values  of 
PCT(I)  P.  P  is  entered  as  a  proportion. 

1=1, NPCT 


15 


F10.5 


16F5.0 

16F5.0 

16F5.0 


COMMENTS 

The  power  function  n  given  earlier  contains  the  parameter 

X,  =  (l  +  n(c^/o2))1/2. 

However,  in  order  to  facilitate  the  use  of  the  power  computations  the  power  function  in  the 
program  is  formulated  in  terms  of  either  one  of  two  other  parameters.  They  are 

(1)  R  =  oA/a  and 

(2)  P  =  (v/c^Tof-oj/o 

If  the  user  chooses  to  compute  power  as  a  function  of  R,  this  simply  means  that  o^/or2  is 
replaced  by  R  in  the  power  function  expression.  In  the  second  case  it  can  be  shown  that 

o^/ct2  =  (1+F)2-1, 

and  this  becomes  the  necessary  substitution  within  the  program. 
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PROBABILITY 


EVALUATION 
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BINVARP 


PURPOSE 

Program  BINVARP  computes  binomial  probabilities  for  the  case  in  which  the  proba¬ 
bility  of  success  on  a  single  trial  is  allowed  to  vary  from  trial  to  trial.  Specifically, 
BINVARP  is  designed  to  compute  the  probability  of  obtaining  x  successes  in  a  sequence 
of  NLIM  trials  where  the  success  probabilities  at  successive  trials  are  pup2 . re¬ 

spectively.  The  classical  situation  in  which  trials  are  independent  and  the  probability  of 
success  remains  constant  from  trial  to  trial  is  referred  to  as  “Bernoulli  trials”  in  the  litera¬ 
ture.  The  situation  addressed  in  BINVARP  is  sometimes  referred  to  as  “Bernoulli  trials 
with  variable  probabilities”,  but  it  shows  up  more  frequently  in  the  literature  under  the 
confusing  name  of  “Poisson  trials”  (see  Reference  1). 

As  an  example  of  a  situation  to  which  BINVARP  is  applicable,  consider  the  following 
scenario.  Suppose  a  weapon  system  attempts  to  destroy  an  incoming  missile  before  the 
missile  damages  the  weapon  system.  A  sequence  of  NLIM  rounds  is  fired  at  the  missile, 
each  having  a  different  probability  of  hit  due  to  the  changing  range  of  the  missile.  If  K  hits 
are  required  to  destroy  the  missile,  what  is  the  probability  that  the  missile  is  destroyed  prior 
to  impact?  BINVARP  can  be  utilized  to  answer  this  question.  Note  that  in  this  example 
the  probability  of  hit  is  range  dependent.  The  maximum  achievable  range  is  usually  divided 
into  subintervals,  and  a  constant  probability  of  hit  is  associated  with  each.  BINVARP 
assumes  that  the  number  of  subintervals,  and,  hence,  the  number  of  distinct  trial  probabil¬ 
ities,  is  less  than  or  equal  to  the  total  number  of  rounds  fired. 

BINVARP  employs  a  recursive  computing  algorithm  which  is  based  on  the  concept  of 
a  probability  generating  function.  For  a  detailed  development  of  this  algorithm  see  Ref¬ 
erences  2  and  3. 


FEATURES 

BINVARP  allows  the  user  to  process  multiple  cases  in  a  single  computer  run.  For  each 
case  BINVARP  outputs  the  following  information: 

*  The  total  number  of  binomial  trials  (NLIM),  the  number  of  distinct  trial  proba¬ 
bilities,  and  a  frequency  table  listing  the  value  of  each  distinct  probability  and  the 
number  of  trials  in  which  it  is  used. 
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*  A  table  of  probabilities  displaying  the  number  of  successes,  (K),  the  probability  of 
obtaining  exactly  K  successes,  and  the  cumulative  probability  of  obtaining  K  or 
less  successes.  The  user  can  control  the  size  of  this  table  by  specifying  the  integer 
index  (KLAST)  of  the  last  probability  he  wishes  to  have  primed  out.  Hence,  the 
resultant  table  will  always  include  probabilities  for  0  through  KLAST  successes. 

*  The  mean  and  standard  deviation  of  the  number  of  successes  in  NUM  binomial 
trials. 


REFERENCES 

1.  Feller,  William  (1965),  An  Introduction  to  Probability  Theory  and  Its  Applications , 
Volume  I,  Second  Edition,  John  Wiley  &  Sons,  Inc.,  pp.  205  -  206  and  pp.  216  -  217. 

2.  Thomas,  M.  A.  and  Taub,  A.  E.  (1975),  Binomial  Trials  With  Variable  Probabilities, 
NSWC/DL  TN-DK-25/75,  NSWC,  Dahlgren,  VIRGINIA  22448. 

3.  Thomas,  M.  A.  and  Taub,  A.  E.  (1982),  Calculating  Binomial  Probabilities  When 
the  Trial  Probabilities  are  Unequal,  Journal  of  Statistical  Computation  and  Simu¬ 
lation,  Volume  14,  pp.  125  -  131. 


INPUT  GUIDE 

Some  changes  have  been  made  to  the  original  program  since  the  date  of  Reference  2. 
For  this  reason  the  input  guide  specified  below  should  take  precedence  over  the  one  given 
in  Reference  2. 

The  user  must  create  a  data  file  containing  the  following  record  types.  The  first  record 
type  specifies  the  number  of  cases  (NCASES)  to  be  run.  The  second  record  type  specifies 
the  number  of  distinct  trial  probabilities,  the  index  of  the  last  probability  to  be  printed  out, 
and  a  flag  value  indicating  whether  or  not  the  number  of  distinct  trial  probabilities  equals 
the  total  number  of  trials  (NUM).  The  third  record  type  is  included  only  if  the  number  of 
distinct  trial  probabilities  is  less  than  NUM.  It  contains  the  frequencies  of  occurrence  of 
each  of  the  distinct  trial  probabilities.  Record  type  four  is  used  to  input  the  values  of  the 
distinct  trial  probabilities.  If  NCASES  is  greater  than  one,  record  types  2  through  4  must 
be  repeated  for  each  case.  The  value  of  NUM  is  determined  within  BINVARP  and,  hence, 
is  not  to  be  input  by  the  user.  Nevertheless,  BINVARP  is  limited  to  a  maximum  of  325 
trials,  so  the  user  must  ensure  that  NUM  <  325  for  each  case  he  runs.  For  NUM  >  325  the 
user  is  referred  to  Reference  1  for  an  approximation. 

More  specifically  the  required  data  file  is  constructed  as  follows: 
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Record 


Type 

Variable 

Description 

Columns 

Format 

1 

NCASES 

Number  of  cases 

1-5 

15 

2 

NP 

Number  of  distinct  trial  probabilities 

1-5 

15 

KLAST 

Index  of  the  last  probability  to  be  printed 
out,  i.e.,  the  probability  of  obtaining 

KLAST  successes  in  NLIM  trials. 

6-10 

15 

NSS 

=0,  NP  <  NLIM 
=1,  NP  =  NLIM 

Record  type  3  is  included  only  if  NP  <  NLIM. 

11-15 

15 

3 

N(l) 

Frequency  of  occurrence  of  1st  distinct  trial 
probability 

1-5 

15 

N(2) 

Frequency  of  occurrence  of  2nd  distinct 
trial  probability 

6-10 

15 

N(16) 

Frequency  of  occurrence  of  16th  distinct 
trial  probability 

76-80 

15 

Additional  records  are  required  in  the  event  that  there  are  more  than  16  distinct 
trial  probabilities. 


4 

P(D 

1st  distinct  trial  probability 

1-10 

FI  0.0 

P(2) 

2nd  distinct  trial  probability 

11-20 

FI  0.0 

P(8) 

8th  distinct  trial  probability 

71-80 

F10.0 

Additional  records  are  required  in  the  event  that  there  are  more  than 

8  distinct 

trial  probabilities. 


COMMENTS 

The  user  should  recognize  that  BINVARP  assumes  that  the  NP  distinct  trial  proba¬ 
bilities  have  been  ordered  in  some  meaningful  way.  The  program’s  algorithm  employs  the 
first  distinct  trial  probabilit>  for  the  first  N(l)  trials,  the  second  distinct  trial  probability  for 
the  next  N(2)  trials,  and  so  fortn  until  the  total  number  of  trials  has  been  exhausted. 
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NEGBIN 


PURPOSE 

Program  NEGBIN  (Negative  Binomial)  evaluates  the  negative  binomial  probability 
distribution  and  cumulative  distribution  functions  for  user  specified  parameter  values.  The 
form  chosen  for  the  negative  binomial  distribution  is 

P(X  =  n)  =  f”_  j  pr(l-p)H'r  ,  n-r  ,r  +  1, ... 

where  p  is  the  probability  of  success  on  a  single  trial  and  X  is  number  of  the  trial  on  which 
the  rth  success  occurs.  The  corresponding  cumulative  distribution  function  is  given  by 

F(n)  =  P(X<n)= 

t=\r  - 1 

The  parameters  specified  by  the  user  are  p  and  r.  A  value  for  F{n),  e.g.,  F(n)  =  C,  must 
also  be  specified,  and  the  program  will  cumulate  probabilities  until  this  value  is  obtained. 
With  the  corresponding  value  of  n  determined  one  can  state  that  the  Prob(rth  success  occurs 
on  or  before  the  nth  trial)  is  C. 

This  procedure  is  oftentimes  used  in  round  requirement  studies  where  one  has  proba¬ 
bility  p  of  hitting  a  target  with  a  single  round,  and  r  hits  are  required  for  a  target  kill.  One 
can  specify  a  required  kill  probability  C  and  ascertain  the  number  of  rounds  n  to  achieve 
it. 


FEATURES 

NEGBIN  features  include  the  following: 

*  P(n)  and  F(n)  evaluations  for  each  r  <  n<N  where  N  is  the  smallest  integer  such 
that  F(n)  >  C. 

*  Provisions  for  up  to  1000  multiple  runs. 

REFERENCES 

1.  Feller,  William  (1957).  An  Introduction  to  Probability  Theory  and  Its  Applications, 
John  Wiley  and  Sons,  pp.  155-156. 

2.  Johnson,  Norman  L.  and  Kotz,  Samuel  (1969).  Distributions  in  Statistics  -  Discrete 
Distributions,  Houghton  Mifflin  Company,  pp.  122-142. 
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INPUT  GUIDE 

The  user-created  input  file  consists  of  two  record  types  as  described  below. 
Record 


Tydc 

Variable 

DescriDtion 

Columns 

Format 

1 

K 

Number  of  cases. 

1-3 

13 

(K  <  1000) 

2 

P 

Single  trial  probability  of  success. 

1-10 

F10.5 

KR 

Negative  binomial  parameter  r. 

16-20 

15 

CONF 

Specification  for  F(n),  i.e.,  the  probability 

25-35 

F10.5 

or  confidence  C  which  terminates  the  nega¬ 
tive  binomial  cumulation. 


Record  2  is  repeated  for  each  case. 
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CONFIDENCE 

LIMIT 

EVALUATION 


) 
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BINCL 


PURPOSE 


Program  BINCL  (Binomial  Confidence  Limits)  evaluates  confidence  limits  and  con¬ 
fidence  bounds  (one-sided  confidence  limits)  for  the  binomial  parameter  P.  This  parameter 
denotes  the  true  probability  of  success  (failure)  in  a  binomial  experiment,  i.e.,  an  experi¬ 
ment  consisting  of  N  independent  and  identical  trials  where  each  trial  results  in  a  success 
or  failure.  The  bounds  and  limits  are  computed  as  a  function  of  the  number  of  trials,  N, 
and  the  number  of  successes  (failures),  R,  in  the  experiment.  The  equations  defining  the 
upper,  P,  and  lower,  P,  I00y%  confidence  bounds  for  P  are  given  by  (Reference  2): 


i(N 


x  =  0 


V 


*  (N 

i 


jp*(l  ~P)N'X 

jpx(l-Pf'^ 


1-Y 

1-Y 


(upper) 


(lower). 


In  order  to  find  the  two-sided  100y%  confidence  limits,  y  is  replaced  with  (1  -y)l  2  in  the 
above  equations.  These  equations  are  solved  using  the  relationships  between  the  cumulative 
binomial  distribution  and  the  incomplete  beta  function  ratio.  The  relationships  are  given  by 

fN V(1  -P)N~X  =  1  -Ip(R  +  1,1V—/?) 


Px(l -Pf~*  =  1  -Ip(R,N-R  +  1) 


where 


/,(a, 6)  =/>-'(!- t)b~l 


The  100y%  upper  confidence  bound  on  P  is  found,  therefore,  by  solving 

I?(R  +  \,N-R)  =  y 

for  P .  Similarly,  the  100y%  lower  confidence  bound  is  found  by  solving 

Ip(R,N~R  +  1)  =  y 

for  P.  To  find  the  \00y%  confidence  limits  on  P,  the  above  equations  are  solved  for  P  and 
P  with  ( l  +  y)  /  2  substituted  for  y  on  their  right  hand  sides.  The  solution  of  Ix(a ,  b )  =  Y  for 
x  in  the  above  is  accomplished  with  an  inverse  incomplete  beta  function  routine  which 
utilizes  the  routine  developed  by  DiDonato  and  Jamagin  (Reference  1)  to  evaluate  Iz(a,b). 
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FEATURES 

The  confidence  bounds  and  limits  are  computed,  tabled,  and  plotted  according  to  any 
one  of  three  options. 

*  Option  1  -  For  all  values  of  R  =  0  to  N  in  steps  of  J  units. 

*  Option  2  -  For  all  values  of  R  =  R  to  N  in  steps  of  J  units. 

*  Option  3  -  For  all  values  of  R  =  0  to  R  in  steps  of  J  units. 

The  plots  show  the  confidence  bounds  and  limits  as  a  function  of  the  number  of  successes 

(failures)  R.  The  table  of  printed  values  is  formatted  to  accommodate  a  maximum  of  51 
values  of  R.  However,  the  plots  will  accommodate  an  unrestricted  number  of  values. 
Multiple  values  of  y  and  N  may  be  input,  and  the  program  will  compute  a  table  and  a  plot 
for  each  combination  of  values. 


REFERENCES 

1.  DiDonato,  A.  R.  and  Jamagin,  M.  P.,  Jr.  (1966),  A  Method  for  Computing  the  In¬ 
complete  Beta  Function  Ratio,  NWL  Report  No.  1949,  Revised,  Naval  Surface 
Warfare  Center,  Dahlgren,  VA 

2.  Mood,  Alexander  M.  and  Graybill,  Franklin  A.  (1963),  Introduction  to  the  Theory 
of  Statistics,  Second  Edition,  McGraw-Hill  Book  Company,  Inc.,  pp.  260-262. 


INPUT  GUIDE 

The  user-created  input  file  consists  of  three  record  types  as  described  below. 


Record 

Type 

Variable 

Description 

Columns 

Format 

1 

NG 

Number  of  values  of  y  to  be  input. 

(NG  <  10) 

1-10 

110 

NNS 

Number  of  values  of  N  to  be  input. 

(NNS  <  10) 

11-20 

110 
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2  GAMMA(I) 

NG  y  values,  GAMMA(I),  I  =  1,  NG.  If 

NG  >  8,  a  second  record  type  2  will  be  re¬ 
quired. 

1-80 

8E10.4 

3  NN(1) 

First  of  NNS  values  of  N. 

1-5 

15 

IOP(l) 

Option  number  associated  with  the  first 
value  of  N,  IOP(l)  =  1,  2,  or  3.  (See  FEA¬ 
TURES  section.) 

6-10 

15 

JAY(l) 

Step  size  associated  with  the  first  value  of 

A/ 

11-15 

15 

IR(1) 

Value  of  R  for  the  first  value  of  N  when 

16-20 

15 

option  2  or  3  is  specified. 

If  NNS  >  1,  an  additional  NNS  -  1  type  3  records  will  be  required. 
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CEPCL 


PURPOSE 


Program  CEPCL  (Circular  £robable  Error  Confidence  Limits)  computes  point  and 
interval  estimates  of  the  lOOPth  circular  percentile  for  a  bivariate  normal  probability  density 
function.  (Program  SEPCL,  described  elsewhere  in  this  report,  computes  estimates  of  the 
lOOPth  spherical  percentile  for  the  trivariate  normal  probability  density  function.)  The 
program  was  designed  for  the  analysis  of  fall-of-shot  data  but  is  applicable  to  any  data 
which  is  drawn  from  the  bivariate  normal  probability  density  function 


f(xl,x2) 


|  V|  12  -r<x-(4'V-'(x-H) 

— - - e  ,  —oo  <  r  <  oo  . 

2n 


In  this  expression,  the  mean  of  the  distribution  is  at  pi=  (p.,,^),  and  the  covariances  of  the 

x ,  are  represented  by  the  2  x  2  variance-covariance  matrix  V.  The  matrix  V  is  in  standard 
form  with  the  tth  diagonal  element  representing  the  variance  of  x,  and  the  (i  ,y)  th  off- 
diagonal  element  representing  the  covariance  between  x,  and  xr 


By  definition,  the  CEP  is  the  radius  of  the  50  percent  circle  (the  radius  of  the  circle 
which  contains  .50  probability)  centered  on  the  target  center.  In  this  program,  CEP  is 
defined  as  an  origin-centered  circle  which  implies  that  the  target  center  is  located  at  the 
Cartesian  origin.  The  program  does  not  restrict  one  to  the  50  percent  circle  but  allows  the 
percentage  100P  to  be  specified  by  the  user.  The  radius  of  this  origin-centered  circle  which 
contains  100P  percent  of  the  bivariate  probability  is  designated  RP.  Hence,  for  P  =  .50,  RP 
coincides  with  the  classical  CEP. 

CEPCL  has  two  user-controlled  modes  of  operation,  the  parameter  mode  and  the  es¬ 
timation  mode.  In  the  parameter  mode,  the  user  has  complete  knowledge  of  the  values  of 
pi  and  V  and  is  interested  only  in  obtaining  the  solution  for  RP,  the  lOOPth  circular  per¬ 
centile.  This  is  a  numerical  integration  problem  for  which  the  software  has  been  formulated 
in  HP  BASIC  by  DiDonato  (Reference  3).  For  use  in  STATLIB,  this  software  has  been 
converted  to  FORTRAN  77  by  Johnson  (Reference  5).  Hence,  given  the  values  of  pi  and 
V,  one  can  obtain  an  exact  numerical  solution  for  RP.  Confidence  limits  on  RP  are  not 
required  since  one  is  essentially  100%  confident  that  the  solution  is  the  lOOPth  circular 
percentile  for  the  bivariate  normal  density  in  question.  In  the  estimation  mode,  one’s  in¬ 
formation  regarding  the  parameters  is  in  the  form  of  n  observed  pairs  Cx„  x2).  In  the  context 
of  weapon  accuracy  analysis,  each  observed  pair  would  represent  an  impact  point  in  two 
space  assuming  target  center  is  at  (0, 0).  These  data  are  used  to  estimate  the  parameters  pi 
and  V  which  are  subsequently  used  as  input  to  obtain  a  point  estimate  of  RP.  This  is 
achieved  using  the  numerical  integration  procedure  referenced  above.  To  obtain  an  interval 
estimate  of  RP,  i.e.,  to  obtain  confidence  limits  for  RP,  one  needs  an  analytical  form  for 
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this  estimate  of  RP.  This  is  provided  with  an  RP  approximation  formulated  by  Grubbs 
(Reference  4).  Using  this  form,  100y%  confidence  limits  on  RP  are  provided  by  an  ap¬ 
proximation  formulated  by  Taub  and  Thomas  (Reference  6). 

In  the  parameter  case,  both  the  exact  numerical  integration  solution  and  the  analytical 
approximation  to  RP  are  provided.  In  the  estimation  case,  both  solutions  are  also  provided 
as  point  estimates  as  well  as  the  confidence  limits  for  RP.  While  the  approximate  solution 
is  not  really  necessary  unless  confidence  limits  are  desired,  it  is  included  to  provide  a  rel¬ 
atively  simple  analytical  form  for  RP  and  to  show  its  closeness  to  the  exact  solution.  The 
approximation  to  RP  and  the  confidence  limit  evaluation  are  discussed  in  the  COMMENTS 
section. 

To  apply  the  approximation  (as  well  as  the  exact  numerical  evaluation),  it  is  assumed 
that  V  is  diagonal,  i.e.,  that  the  xt  are  statistically  independent.  If  V  is  not  diagonal  on  input 
(or  if  V  is  estimated  from  observed  data),  independence  is  induced  by  rotating  V  by  means 
of  an  orthogonal  transformation.  This  rotation  will  be  discussed  in  the  COMMENTS 
section.  At  this  point,  it  is  sufficient  to  know  that  the  evaluation  of  RP  and  its  confidence 
limits  are  based  on  values  of  the  means  (q,  pj  and  variances  either  input  values  (V 

diagonal),  transformed  values  (V  not  diagonal)  or  estimated  values  which  may  or  may  not 
have  been  transformed. 


FEATURES 

CEPCL  features  include  the  two  modes  of  operation  discussed  above.  In  the  parameter 
mode,  the  user  specifies  whether  V  is  or  is  not  diagonal.  If  diagonal,  only  the  diagonal 
elements  of  V,  i.e.,  the  variances  are  input.  Otherwise,  the  entire  2x2  matrix  is  input.  In 
the  estimation  mode,  input  consists  of  n  observed  pairs  from  which  )_i  and  V  are  estimated. 
However,  there  are  no  internal  tests  to  ascertain  if  jn=  0,  if  the  off-diagonal  elements  of  V 
are  zero,  and  if  zero  whether  the  variances  are  equal.  The  user  controls  how  the  estimates 
of  the  parameters  are  incorporated  in  subsequent  evaluations  by  input  assumptions  which 
should  be  based  on  external  testing.  Therefore,  in  the  estimation  mode,  it  is  recommended 
that  two  runs  be  made.  The  first  should  be  made  without  any  simplifying  assumptions.  The 
output  from  this  run  can  then  be  used  to  perform  significance  tests  on  the  parameters  H  and 
V.  A  subsequent  run  should  be  made  if  these  test  results  permit  any  simplifying  assump¬ 
tions. 

Other  features  include  the  following: 

*  A  listing  of  the  first  50  data  points  (estimation  mode). 

*  A  printout  of  the  input  mean  vector  p.  and  covariance  matrix  V  (parameter  mode). 
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*  A  printout  of  the  mean  vector  jj.  and  covariance  matrix  V  after  rotation  (V  not 
diagonal). 

*  A  printout  of  the  mean  vector  |H  and  covariance  matrix  V  under  user  specified 
assumptions  (estimation  mode). 

*  A  printout  of  input  specifications  including  user  specified  assumption  (estimation 
mode). 

*  Computation  and  printout  of  RP  (parameter  mode)  or  RP  (estimation  mode), 
confidence  limits  on  RP  (estimation  mode),  and  degrees  of  freedom  for  the  chi- 
square  approximations. 


REFERENCES 

1.  Anderson,  T.  W.  (1958),  An  Introduction  to  Multivariate  Statistical  Analysis,  John 
Wiley  and  Sons,  Inc.,  p.  19. 

2.  Browne,  E.  T.  (1958),  Introduction  to  the  Theory  of  Determinants  and  Matrices, 
The  University  of  North  Carolina  Press,  p.  88  and  p.  106. 

3.  DiDonato,  A.  R.  (1987),  Integration  of  the  Trivariate  Normal  Distribution  Over  an 
Offset  Sphere  and  an  Inverse  Problem,  NSWC  TR  87-27,  NSWC,  Dahlgren,  VIR¬ 
GINIA  22448 

4.  Grubbs,  F.  E.  (1964),  “Approximate  Circular  and  Non-Circular  Offset  Probabilities 
of  Hitting,”  Operations  Research,  Vol.  12,  No.  1. 

5.  Johnson,  G.  M.  (1989),  FORTRAN  Conversion  of  Trivariate  Normal  Integration 
Over  an  Offset  Sphere,  unpublished  NSWC  TN,  NSWC,  Dahlgren,  VIRGINIA 
22448 

6.  Taub,  A.  E.  and  Thomas,  M.  A.  (1983),  Confidence  Intervals  for  CEP  When  the 
Errors  are  Elliptical  Normal,  NSWC  TR  83-205,  NSWC,  Dahlgren,  VIRGINIA 
22448 


INPUT  GUIDE 

The  user-created  input  file  consists  of  four  record  types  in  either  mode  of  operation. 
Record 

Type  Variable  _ Description _  Columns  Format 

1  LI  =0,  parameter  mode  1-2  12 

=  1,  estimation  mode. 

Use  records  2,  3,  and  4  only  if  LI  =  1 . 
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FORMD 

Data  input  format  (in  parentheses). 

1-80 

8A10 

N 

Number  of  observed  data  points. 

1-5 

15 

(N  <  10,000) 

IDIAG 

=0, 

analysis  assumes  V  diagonal. 

6-10 

15 

=1, 

analysis  assumes  V  not  diagonal. 

IMU 

=0, 

analysis  assumes  mu  is  zero. 

11-15 

15 

=1, 

analysis  assumes  mu  is  not  zero. 

IEQ 

=0, 

analysis  assumes  variances  equal. 

16-20 

15 

=1, 

analysis  assumes  variances  not  equal. 

P 

Circular  proportion  requested. 

21-30 

F10.8 

(0  <  P  <  1) 

CCOEF 

Confidence  coefficient  for  confidence  limits 

31-35 

F5.4 

on  RP. 

Data  according  to  format  FORMD. 


Use  records  5,  6,  and  7  or  8  only  if  LI  =  0. 

IDIAG  =0,  V  is  diagonal. 

=1,  V  is  not  diagonal. 

P  Circular  proportion  requested. 

(0  <  P  <  1) 

MU  Mean  vector  elements  MU(I),  1  =  1,2 

Use  record  7  if  IDIAG  =  0. 

V  Diagonal  elements  of  V,  V(I,I),  1  =  1,2.  1-40  2F20.10 

Use  record  8  if  IDIAG  =  1. 


1-5  15 

6-15  F10.8 

1-40  2F20.10 


V 


Elements  of  V,  one  row  per  line. 
V(1,J),  J  =  1,  2 
V(2,J),  J  =  1,  2 


1-40  2F20.10 

1-40  2F20.10 
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COMMENTS 

A  problem  in  CEPCL  (estimation  mode)  is  that  of  approximating  RP  with  an  analytical 
form  which  can  subsequently  be  used  to  obtain  confidence  limits  on  RP.  The  problem  can 
be  stated  as  finding  RP  such  that 

Prob^*!2  <  (RP  )2  j  =  p 

under  the  bivariate  normal  assumptions  on  the  x,.  To  solve  this  problem  analytically,  we 
need  the  distribution  of  'F2  =  Z,  x2 .  It  turns  out  that  'P2  is  a  weighted  sum  of  non-central 
chi-square  variables  with  a  probability  distribution  which  is  too  complicated  to  be  useful. 
However,  this  distribution  converges  to  a  chi-square  distribution  as  the  o?  approach  equality 
and  the  p,  approach  zero.  Hence,  the  distribution  of  v^/EQP2)  is  approximated  with  a 

chi-square  distribution  with  v  degrees  of  freedom.  The  degrees  of  freedom  are  obtained 
by  the  “method  of  matching  moments,”  i.e.,  v  is  the  solution  of  the  equation  which  equates 
the  variance  of  v'P2/  EQ¥2)  with  the  variance  of  a  chi-square  with  v  degrees  of  freedom. 
The  solution  to  this  equation  and  the  results  of  the  approximation  are  shown  below: 

(/?/,)2  =  ^Sof+Ipfj/2 

where 

K  =  2xlP/v. 

In  this  expression,  xl ,p  is  the  lOOPth  percentile  of  a  chi-square  distribution  with  v  degrees 
of  freedom  where 

v  =  2m2  lc 

and  where  m  and  c  are  the  functions  of  the  p;  and  of  below: 

m  =  l  + 

The  user  is  referred  to  Reference  4  for  the  complete  derivation. 

In  the  estimation  mode,  the  value  RP  is  actually  an  estimate  of  the  true  RP.  Using  the 
circumflex  (or  hat)  convention  to  denote  estimates  of  parameters,  one  would  write 

(/?>)2  =  AT^Sof  +  S  M-fJ/2  . 

To  place  confidence  limits  on  RP,  the  distribution  of 
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v2(RP)2/(RP)2 


is  approximated  with  a  chi-square  distribution  with  v2  degrees  of  freedom  using  the 
“method  of  matching  moments”  described  above.  It  is  found  that  v2  has  value 

v2  =  2«2m2/^nc+2XO'/^°?jj. 

The  approximate  100y%  confidence  limits  for  RP  are  then  taken  as 


(  RP 

RP  ~j 

1/2  > 

IV  ii/2 

LV?sJ  J 

The  user  is  referred  to  Reference  6  for  the  complete  derivation  of  RP  confidence  limits  and 
an  evaluation  of  their  accuracy. 


A  brief  discussion  regarding  the  rotation  of  non-diagonal  covariance  matrices  will  aid 
the  user  in  understanding  his  output.  CEPCL  is  applicable  to  any  problem  where  the  errors 
are  bivariate  normal  whether  V  is  or  is  not  diagonal.  However,  the  routines  for  computing 
RP  (exact  or  approximate)  require  that  V  be  diagonal,  i.e.,  that  the  x,  be  independent  random 
variables.  Hence,  if  V  is  not  diagonal  on  input  (parameter  input  or  an  estimate  from  input 
data),  a  rotation  must  be  performed  which  removes  the  off-diagonal  elements  from  V  and 
adjusts  the  diagonal  elements  accordingly.  This  process  is  based  on  the  following  princi¬ 
ples:  If  the  matrix  V  is  symmetric  there  exists  an  orthogonal  matrix  W  such  that  W'VW 
=  D  where  D  is  diagonal  (Reference  2).  (An  orthogonal  matrix  W  is  square  and  defined 
such  that  WW'  =  I.)  Also,  if  the  random  vector  x  has  a  multivariate  normal  distribution 
with  mean  fJ.  and  covariance  matrix  V,  i.e.,  if 

x~A(|X,  V) , 

and  if  y  =  W'x,  then  it  follows  from  normal  theory  (Reference  1)  that 

y~/V(W']X,W'VW). 

Hence,  if  x  is  multivariate  normal,  it  follows  that  there  exists  an  orthogonal  W  such  that  y 
=  W'x  is  multivariate  normal  with  mean  W'jx  and  diagonal  covariance  matrix  D  =  W'VW. 
Furthermore,  since  W  is  orthogonal,  Z, y2  —  ^L,x2  so  that 

Prob(p.2  <  (RPf)  =  Prob(l y-  <  (^)2)  . 

The  result  is  that  given  W,  one  can  transform  from  x  to  y  (thus  obtaining  a  new  mean  and 
a  new  diagonal  covariance  matrix),  and  the  integral  over  any  circular  region  is  not  affected 
by  the  transformation.  The  orthogonal  matrix  W  is  constructed  in  CEPCL  by  finding  the 
eigenvalues  of  V  and  the  eigenvector  associated  with  each  eigenvalue,  and  then  taking  W 
as  the  matrix  of  eigenvectors. 
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SEPCL 


PURPOSE 


Program  SEPCL  (Spherical  £robable  Error  Confidence  Limits)  computes  point  and 
interval  estimates  of  the  lOOPth  spherical  percentile  for  a  trivariate  normal  probability 
density  function.  (Program  CEPCL,  described  elsewhere  in  this  report,  computes  estimates 
of  the  lOOPth  circular  percentile  for  the  bivariate  normal  probability  density  function.)  The 
program  was  designed  for  the  analysis  of  fall-of-shot  data  .but  is  applicable  to  any  data 
which  is  drawn  from  the  trivariate  normal  probability  density  function 


f{x  i,*2,x3) 


l  vi  1/2  4(x-^v-W) 


,  <  X;  <  OO  . 


In  this  expression,  the  mean  of  the  distribution  is  at  |_i=  and  the  covariances  of 

the  Xi  are  represented  by  the  3  x  3  variance-covariance  matrix  V.  The  matrix  V  is  in  stan¬ 
dard  form  with  the  ith  diagonal  element  representing  the  variance  of  xt  and  the  (i,j) th 
off-diagonal  element  representing  the  covariance  between  xt  and  x}. 


By  definition,  the  SEP  is  the  radius  of  the  50  percent  sphere  (the  radius  of  the  sphere 
which  contains  .50  probability)  centered  on  the  target  center.  In  this  program,  SEP  is  de¬ 
fined  as  an  origin-centered  sphere  which  implies  that  the  target  center  is  located  at  the 
Cartesian  origin.  The  program  does  not  restrict  one  to  the  50  percent  sphere  but  allows  the 
percentage  100P  to  be  specified  by  the  user.  The  radius  of  this  origin-centered  sphere 
which  contains  100P  percent  of  the  trivariate  probability  is  designated  RP.  Hence,  for  P  = 
.50,  RP  coincides  with  the  classical  SEP. 


SEPCL  has  two  user-controlled  modes  of  operation,  the  parameter  mode  and  the  es¬ 
timation  mode.  In  the  parameter  mode,  the  user  has  complete  knowledge  of  the  values  of 
jo.  and  V  and  is  interested  only  in  obtaining  the  solution  for  RP,  the  lOOPth  spherical  per¬ 
centile.  This  is  a  numerical  integration  problem  for  which  the  software  has  been  formulated 
in  HP  BASIC  by  DiDonato  (Reference  3).  For  use  in  STATLIB,  this  software  has  been 
converted  to  FORTRAN  77  by  Johnson  (Reference  5).  Hence,  given  the  values  of  |~i  and 
V,  one  can  obtain  an  exact  numerical  solution  for  RP.  Confidence  limits  on  RP  are  not 
required  since  one  is  essentially  100%  confident  that  the  solution  is  the  lOOPth  spherical 
percentile  for  the  trivariate  normal  density  in  question.  In  the  estimation  mode,  one’s  in¬ 
formation  regarding  the  parameters  is  in  the  form  of  n  observed  3-tuples  (x^x^xj .  In  the 

context  of  weapon  accuracy  analysis,  each  observed  3-tuple  would  represent  an  impact 
point  in  three  space  assuming  target  center  is  at  (0,  0,  0).  These  data  are  used  to  estimate 
the  parameters  and  V  which  are  subsequently  used  as  input  to  obtain  a  point  estimate 
of  RP.  This  is  achieved  using  the  numerical  integration  procedure  referenced  above.  To 
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obtain  an  interval  estimate  of  RP,  i.e.,  to  obtain  confidence  limits  for  RP,  one  needs  an 
analytical  form  for  this  estimate  of  RP.  This  is  provided  with  an  RP  approximation  for¬ 
mulated  by  Grubbs  (Reference  4).  Using  this  form,  100y%  confidence  limits  on  RP  are 
provided  by  an  approximation  formulated  by  Taub  and  Thomas  (Reference  6). 

In  the  parameter  case,  both  the  exact  numerical  integration  solution  and  the  analytical 
approximation  to  RP  are  provided.  In  the  estimation  case,  both  solutions  are  also  provided 
as  point  estimates  as  well  as  the  confidence  limits  for  RP.  While  the  approximate  solution 
is  not  really  necessary  unless  confidence  limits  are  desired,  it  is  included  to  provide  a  rel¬ 
atively  simple  analytical  form  for  RP  and  to  show  its  closeness  to  the  exact  solution.  The 
approximation  to  RP  and  the  confidence  limit  evaluation  are  discussed  in  the  COMMENTS 
section. 

To  apply  the  approximation  (as  well  as  the  exact  numerical  evaluation),  it  is  assumed 
that  V  is  diagonal,  i.e.,  that  the  x,  are  statistically  independent.  If  V  is  not  diagonal  on  input 
(or  if  V  is  estimated  from  observed  data),  independence  is  induced  by  rotating  V  by  means 
of  an  orthogonal  transformation.  This  rotation  will  be  discussed  in  the  COMMENTS 
section.  At  this  point,  it  is  sufficient  to  know  that  the  evaluation  of  RP  and  its  confidence 
limits  are  based  on  values  of  the  means  (lii.mm).  and  variances  (o?,c^,o^).  either  input 

values  (V  diagonal),  transformed  values  (V  not  diagonal)  or  estimated  values  which  may 
or  may  not  have  been  transformed. 


FEATURES 

SEPCL  features  include  the  two  modes  of  operation  discussed  above.  In  the  parameter 
mode,  the  user  specifies  whether  V  is  or  is  not  diagonal.  If  diagonal,  only  the  diagonal 
elements  of  V,  i.e.,  the  variances  are  input.  Otherwise,  the  entire  3x3  matrix  is  input.  In 
the  estimation  mode,  input  consists  of  n  observed  3-tuples  from  which  p.  and  V  are  esti¬ 
mated.  However,  there  are  no  internal  tests  to  ascertain  if  |j,  =  0,  if  the  off-diagonal 
elements  of  V  are  zero,  and  if  zero  whether  the  variances  are  equal.  The  user  controls  how 
the  estimates  of  the  parameters  are  incorporated  in  subsequent  evaluations  by  input  as¬ 
sumptions  which  should  be  based  on  external  testing.  Therefore,  in  the  estimation  mode, 
it  is  recommended  that  two  runs  be  made.  The  first  should  be  made  without  any  simpli¬ 
fying  assumptions.  The  output  from  this  run  can  then  be  used  to  perform  significance  tests 
on  the  parameters  and  V.  A  subsequent  run  should  be  made  if  these  test  results  permit 
any  simplifying  assumptions. 

Other  features  include  the  following: 

*  A  listing  of  the  first  50  data  points  (estimation  mode). 

*  A  printout  of  the  input  mean  vector  jj.  and  covariance  matrix  V  (parameter  mode). 
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*  A  printout  of  the  mean  vector  }_L  and  covariance  matrix  V  after  rotation  (V  not 
diagonal). 

*  A  printout  of  the  mean  vector  JJ.  and  covariance  matrix  V  under  user  specified 
assumptions  (estimation  mode). 

*  A  printout  of  input  specifications  including  user  specified  assumption  (estimation 
mode). 

*  Computation  and  printout  of  RP  (parameter  mode)  or  RP  (estimation  mode), 
confidence  limits  on  RP  (estimation  mode),  and  degrees  of  freedom  for  the  chi- 
square  approximations. 
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2.  Browne,  E.  T.  (1958),  Introduction  to  the  Theory  of  Determinants  and  Matrices, 
The  University  of  North  Carolina  Press,  p.  88  and  p.  106. 

3.  DiDonato,  A.  R.  (1987),  Integration  of  the  Trivariate  Normal  Distribution  Over  an 
Offset  Sphere  and  an  Inverse  Problem,  NSWC  TR  87-27,  NSWC,  Dahlgren,  VIR¬ 
GINIA  22448 

4.  Grubbs,  F.  E.  (1964),  “Approximate  Circular  and  Non-Circular  Offset  Probabilities 
of  Hitting,”  Operations  Research,  Vol.  12,  No.  1. 

5.  Johnson,  G.  M.  (1989),  FORTRAN  Conversion  of  Trivariate  Normal  Integration 
Over  an  Offset  Sphere,  unpublished  NSWC  TN,  NSWC,  Dahlgren,  VIRGINIA 
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6.  Taub,  A.  E.  and  Thomas,  M.  A.  (1983),  Confidence  Intervals  for  CEP  When  the 
Errors  are  Elliptical  Normal,  NSWC  TR  83-205,  NSWC,  Dahlgren,  VIRGINIA 
22448 


INPUT  GUIDE 

The  user-created  input  file  consists  of  four  record  types  in  either  mode  of  operation. 


Record 

Type 

Variable 

Description 

Columns 

Format 

1 

LI 

=0,  parameter  mode 
=1,  estimation  mode. 

1-2 

12 

Use  records  2,  3,  and  4  only  if  LI  =  1. 
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FORMD 

Data  input  format  (in  parentheses). 

1-80 

8A10 

N 

Number  of  observed  data  points. 

1-5 

15 

(N  <  10,000) 

IDIAG 

=0, 

analysis  assumes  V  diagonal. 

6-10 

15 

=1, 

analysis  assumes  V  not  diagonal. 

IMU 

=0, 

analysis  assumes  mu  is  zero. 

11-15 

15 

=1, 

analysis  assumes  mu  is  not  zero. 

IEQ 

=0, 

analysis  assumes  variances  equal. 

16-20 

15 

=1, 

analysis  assumes  variances  not  equal. 

P 

Spherical  proportion  requested. 

21-30 

F10.8 

(0  <  P  <  1) 

CCOEF 

Confidence  coefficient  for  confidence  limits 

31-35 

F5.4 

on  RP. 

4  Data  according  to  format  FORMD. 

Use  records  5,  6,  and  7  or  8  only  if  LI  =0. 

5  IDIAG  =0,  V  is  diagonal.  1-5  15 

=1,  V  is  not  diagonal. 

P  Spherical  proportion  requested.  6-15  F10.8 

(0<P<  1) 

6  MU  Mean  vector  elements  MU (I),  I  =  1,  2,  3.  1-60  3F20.10 

Use  record  7  if  IDIAG  =  0. 


7  V  Diagonal  elements  of  V,  V(I,I),  I  =  1,  2,  3.  1-60  3F20.10 

Use  record  8  if  IDIAG  =  1. 


8  V  Elements  of  V,  one  row  per  line. 

V(1,J),  J  =  1,  2,  3  1-60  3F20.10 

V(2,J),J=  1,2,  3  1-60  3F20.10 

V(2,J),J=  1,2,3 
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COMMENTS 

A  problem  in  SEPCL  (estimation  mode)  is  that  of  approximating  RP  with  an  analytical 
form  which  can  subsequently  be  used  to  obtain  confidence  limits  on  RP.  The  problem  can 
be  stated  as  finding  RP  such  that 

ProbfftNtfW’/W 

under  the  trivariate  normal  assumptions  on  the  x,.  To  solve  this  problem  analytically,  we 
need  the  distribution  of  *F2  =  X,*,2 .  It  turns  out  that  'F2  is  a  weighted  sum  of  non-central 
chi-square  variables  with  a  probability  distribution  which  is  too  complicated  to  be  useful. 
However,  this  distribution  converges  to  a  chi-square  distribution  as  the  of  approach  equality 
and  the  (I,  approach  zero.  Hence,  the  distribution  of  v^/EQP2)  is  approximated  with  a 

chi-square  distribution  with  v  degrees  of  freedom.  The  degrees  of  freedom  are  obtained 
by  the  “method  of  matching  moments,”  i.e.,  v  is  the  solution  of  the  equation  which  equates 
the  variance  of  v*F2 /  EQ¥2)  with  the  variance  of  a  chi-square  with  v  degrees  of  freedom. 
The  solution  to  this  equation  and  the  results  of  the  approximation  are  shown  below: 

where 

K  =  3  j£,/v. 

In  this  expression,  xl.p  is  the  lOOPth  percentile  of  a  chi-square  distribution  with  v  degrees 
of  freedom  where 

v  =  2m  I  c 

and  where  m  and  c  are  the  functions  of  the  p,  and  of  below: 

m  =  1  +  X  p2/^°f  j 

c  =  2(1  of  +  2  X ofpf )/(X of  J  . 

The  user  is  referred  to  Reference  4  for  the  complete  derivation. 

In  the  estimation  mode,  the  value  RP  is  actually  an  estimate  of  the  true  RP.  Using  the 
circumflex  (or  hat)  convention  to  denote  estimates  of  parameters,  one  would  write 

(/?/,)2  =  A'^of+  l£?)/3  . 

To  place  confidence  limits  on  RP,  the  distribution  of 
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\2(RP)2 1  (RP)2 


is  approximated  with  a  chi-square  distribution  with  v2  degrees  of  freedom  using  the 
“method  of  matching  moments”  described  above.  It  is  found  that  v2  has  value 


v2  =  2 n  m  I 


2-"2'1  nc  +  2Xa?/(£e*n  ]. 


The  approximate  100  %  confidence  limits  for  RP  are  then  taken  as 


( 


RP 


RP 


2  1 1  /2  ’  r  2  "|l 


/2 


) 


The  user  is  referred  to  Reference  6  for  the  complete  derivation  of  RP  confidence  limits  and 
an  evaluation  of  their  accuracy. 


A  brief  discussion  regarding  the  rotation  of  non-diagonal  covariance  matrices  will  aid 
the  user  in  understanding  his  output.  SEPCL  is  applicable  to  any  problem  where  the  errors 
are  trivariate  normal  whether  V  is  or  is  not  diagonal.  However,  the  routines  for  computing 
RP  (exact  or  approximate)  require  that  V  be  diagonal,  i.e.,  that  the  x,  be  independent  ran¬ 
dom  variables.  Hence,  if  V  is  not  diagonal  on  input  (parameter  input  or  an  estimate  from 
input  data),  a  rotation  must  be  performed  which  removes  the  off-diagonal  elements  from  V 
and  adjusts  the  diagonal  elements  accordingly.  This  process  is  based  on  the  following 
principles:  If  the  matrix  V  is  symmetric  there  exists  an  orthogonal  matrix  W  such  that 
W'VW  =  D  where  D  is  diagonal  (Reference  2).  (An  orthogonal  matrix  W  is  square  and 
defined  such  that  WW'  =  I.)  Also,  if  the  random  vector  x  has  a  multivariate  normal  dis¬ 
tribution  with  mean  jn  and  covariance  matrix  V,  i.e.,  if 

x~N(u,V) , 

and  if  y  =  W'x,  then  it  follows  from  normal  theory  (Reference  1)  that 

y~/V(W'M-,W'VW) . 

Hence,  if  x  is  multivariate  normal,  it  follows  that  there  exists  an  orthogonal  W  such  that  y 
=  W'x  is  multivariate  normal  with  mean  W'p.  and  diagonal  covariance  matrix  D  =  W'VW. 
Furthermore,  since  W  is  orthogonal,  X,>’,2  =  x,2  so  that 

Prob^*,  <  (RP)  j  =  Prob^X  <  (RP)  j  . 

The  result  is  that  given  W,  one  can  transform  from  x  to  y  (thus  obtaining  a  new  mean  and 
a  new  diagonal  covariance  matrix),  and  the  integral  over  any  circular  region  is  not  affected 
by  the  transformation.  The  orthogonal  matrix  W  is  constructed  in  SEPCL  by  finding  the 
eigenvalues  of  V  and  the  eigenvector  associated  with  each  eigenvalue,  and  then  taking  W 
as  the  matrix  of  eigenvectors. 
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LD50EST 


PURPOSE 

Program  LD50EST  (Lethal  Dose  Estimation)  computes  estimates  of  the  mean  and 
standard  deviation  of  an  assumed  normal  distribution  based  on  quantal  response  data. 
Quantal  response  data  refers  to  a  situation  where  a  stimulus  is  applied  to  a  test  unit  and  the 
response  is  either  a  success  or  failure,  a  go  or  a  no-go,  etc.  Examples  of  quantal  response 
experimentation  are  found  in  experimental  areas  such  as  explosive  sensitivity  (drop  tests, 
fragment  impact  tests,  etc)  and  chemical  sensitivity  (insecticide  tests,  drug  tests,  etc.).  The 
latter  area  originated  the  term  Lethal  Dose  50  (LD50)  as  a  synonym  for  the  median  (and 
mean)  of  the  assumed  normal  distribution  for  the  population  under  consideration. 

The  examples  mentioned  fall  into  a  special  category  of  statistics  called  sensitivity 
analysis.  Sensitivity  analysis  is  based  upon  certain  assumptions  with  respect  to  the  testing 
environment.  Each  test  unit  is  assumed  to  be  associated  with  a  critical  stimulus  level. 
When  a  unit  is  subjected  to  a  stimulus  less  then  its  critical  level  it  does  not  respond  (failure). 
Conversely,  when  a  unit  is  subjected  to  a  stimulus  greater  than  its  critical  level  it  responds 
positively  (success).  The  distribution  of  critical  stimulus  levels  of  items  from  a  particular 
population  is  assumed  to  be  normal  with  mean,  (.1  (LD50),  and  standard  deviation,  a.  Re¬ 
sponse  data  obtained  over  several  stimulus  levels  is  analyzed  by  the  method  of  maximum 
likelihood  to  produce  estimates,  |i(LD50)  and  Ct,  of  the  population  parameters.  A  modified 
Newton-Raphson  procedure  is  used  to  iteratively  solve  the  maximum  likelihood  equations 
for  the  estimated  values.  Two  requirements  must  be  satisfied  by  the  data  in  order  for  the 
solution  procedure  to  obtain  estimates  of  the  parameters.  The  first  requirement  is  that  the 
average  of  the  stimulus  values  which  yielded  successes  must  exceed  the  average  of  the 
stimulus  values  which  yielded  failures.  The  second  requirement  is  that  a  zone  of  mixed 
results  (ZMR)  must  exist.  A  ZMR  exists  when  the  maximum  stimulus  level  associated  with 
a  failure  exceeds  the  minimum  stimulus  level  associated  with  a  success.  The  variance- 
covariance  matrix  for  p  and  a  is  also  computed  by  the  program.  The  program  computes 
point  estimates  and  approximate  confidence  intervals  for  stimulus  levels  associated  with 
user-specified  probabilities  of  a  success  (positive  response). 

References  2  and  3  present  the  early  development  of  statistical  methods  for  analysis  of 
sensitivity  data.  Reference  1  presents  the  formulation  used  in  the  original  computer  pro¬ 
gram  and  some  statistical  results  from  several  sample  data  sets.  The  current  STATLIB 
version  of  the  program  has  been  modified  from  the  version  presented  in  Reference  1. 
These  modifications  include  the  addition  of  computations  for  point  and  interval  estimation 
of  stimulus  levels  associated  with  specified  success  probabilities  and  the  deletion  of  plotting 
options  for  simultaneous  confidence  ellipses  for  p  and  o. 
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FEATURES 

The  output  features  of  LD50EST  include  the  following: 

*  Printout  of  the  stimulus  levels  and  the  number  of  “successes”  and  “failures”  at 
each  level. 

*  Estimates  of  the  mean,  |i(LD50),  and  the  standard  deviation,  c,  of  the  assumed 
normal  distribution  of  critical  stimulus  levels. 

*  Variance-covariance  matrix  for  jl(LD50)  and  a. 

*  Point  estimation  and  approximate  interval  estimation  of  stimulus  levels  for  user- 
specified  success  probability  values  and  confidence  coefficient. 

REFERENCES 

1.  DiDonato,  A.  R.  and  Jamigan,  M.  P.,  Jr.  (1972),  Use  of  the  Maximum  Likelihood 
Method  under  Quantal  Responses  for  Estimating  the  Parameters  of  a  Normal 
Distribution  and  Its  Application  to  an  Armor  Penetration  Problem,  NWL  Technical 
Report  TR-2846,  NSWC,  Dahlgren,  VIRGINIA  22448. 

2.  Dixon,  W.  J.  and  Mood,  A.  M.  (1948),  “A  Method  for  Obtaining  and  Analyzing 
Sensitivity  Data”,  Journal  of  the  American  Statistical  Association,  Vol.  43,  pp.  109  - 
126. 

3.  Golub,  A.  and  Grubbs,  F.  (1956),  “Analysis  of  Sensitivity  Experiments  when  the 
Levels  of  the  Stimulus  Cannot  be  Controlled”,  Journal  of  the  American  Statistical 
Association,  Vol.  51,  pp.  257  -  265. 

INPUT  GUIDE 

The  specifications  of  the  user-created  input  file  are  given  below.  The  file  must  contain 
record  types  1,  2,  3,  4,  5A  and  5B  for  analysis  of  a  single  data  set.  Multiple  data  sets  can 
be  processed  with  a  single  input  file  by  consecutively  including  corresponding  sets  of  re¬ 
cord  types  1  through  5  (A  and  B). 

Record 


-Type- 

Variable 

Description 

Columns 

Format 

l 

IDENT 

Problem  description. 

1-72 

9A8 

2 

NL 

Number  of  unique  stimulus  levels  where 
positive  responses  (“successes”)  occurred. 

1-4 

14 

ML 

Number  of  unique  stimulus  levels  where 
negative  responses  (“failures”)  occurred. 

5-8 

14 
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INPUT  =0,  default  starting  values  for  the  iterative  9-10  12 

solution  process  will  be  computed  for 
estimates  of  alpha  and  beta.  See 
COMMENTS. 

=  1,  starting  values  for  the  iterative  solu¬ 
tion  process  will  be  input  for  estimates 
of  alpha  and  beta.  See  COMMENTS. 

1TE  =0,  no  printout  of  intermediate  estimates  11-12  12 

of  alpha  and  beta  at  each  iteration  of 
the  solution. 

=  1,  intermediate  estimates  of  alpha  and 
beta  at  each  iteration  of  the  solution 
are  printed. 

ALPHAO  Starting  value  for  alpha. 

BETAO  Starting  value  for  beta. 

CCOF  Confidence  coefficient  for  interval  estima¬ 

tion  of  stimulus  levels. 

(0  <  CCOF  <  1) 

3  NPL  Number  of  success  probability  values  for 

which  point  and  interval  estimation  is  re¬ 
quested. 

(1  <  NPL  <  15) 

PL(I)  Success  probability  values  for  point  and  in¬ 
terval  estimation  of  stimulus  levels.  NPL 
values  must  be  entered. 

4  FORM  Format  (in  parentheses)  for  reading  response  1-80  10A8 

data.  The  response  data  consists  of  two  at¬ 
tributes  at  each  stimulus  level,  the  number 
of  responses  (“successes”  oi  “failures”)  and 
associated  stimulus  level.  The  number  of 
responses  must  be  read  as  an  integer  vari¬ 
able.  Example  format:  (10(12, F6.1)) 

5A  FNA(i),  Enter  response  data  for  “successes”.  Data  See  FORM 

A(i)  consists  of  NL  pairs  of  numbers.  The  first  FORM 
i=l,...,NL  value  is  the  number  of  “successes”  and  the 
second  value  is  the  associated  stimulus  level. 

The  number  of  “successes”  must  be  in  inte¬ 
ger  format.  Repeat  this  record  type  as 
required  to  enter  all  “success”  data. 


13-32  E20.14 

33-52  E20.14 
53-60  F8.4 


1-5  15 


6-80  15F5.3 
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5B 


FNB(i),  Enter  response  data  for  “failures”.  Data  See  FORM 

B(i)  consists  of  ML  pairs  of  numbers.  The  first  FORM 

i=l,...,ML  value  is  the  number  of  “failures”  and  the 
second  value  is  the  associated  stimulus  level. 

The  number  of  “failures”  must  be  in  integer 
format.  Repeat  this  record  type  as  required 
to  enter  all  “failure”  data. 


COMMENTS 

The  parameters  |i(LD50)  and  a  of  the  assumed  normal  distribution  of  critical  stimulus 
levels  are  not  estimated  directly  by  the  procedure  given  in  Reference  1.  A  transformation 
is  made  to  the  variables, 

a  =  ji/o  and  (3=1  la , 

as  a  simplification  to  the  iterative  Newton-Raphson  solution  procedure.  Therefore,  a  and 
3  are  obtained  directly  by  the  program  and  retransformed  to  the  estimates  (1  and  a.  Starting 
values  for  a  and  0  are  required  by  the  solution  procedure  and  must  be  input  by  the  user  or 
computed  by  the  program  under  the  default  algorithm.  Experience  has  shown  that  the 
default  algorithm  generates  satisfactory  values  in  almost  every  case. 

For  data  sets  consisting  of  stimulus  levels  near  zero,  it  is  possible  to  obtain  estimates 
of  |i(LD50)  which  are  negative  if  the  ZMR  is  relatively  large.  In  these  cases,  the  negative 
value  of  the  estimate  is  printed  parenthetically  and  a  value  of  zero  is  printed  as  a  “working” 
estimate  of  |l(LD50).  In  subsequent  calculations  for  point  and  interval  estimation  of 
stimulus  levels  for  selected  success  probabilities,  the  value  of  zero  is  used  for  |i(LD50) 
Any  negative  estimate  for  a  point  or  interval  bound  is  printed  as  a  zero  in  the  output.  The 
foregoing  is  based  on  the  assumption  that  only  positive  stimulus  values  are  possible.  Al¬ 
though  the  solution  algorithm  will  process  negative  values,  a  situation  with  both  positive 
and  negative  stimulus  levels  does  not  readily  come  to  mind.  Consequently,  the  user  is 
advised  to  restrict  the  input  stimulus  levels  to  positive  values  and  to  treat  zero-value  esti¬ 
mates  cautiously. 
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FFAC2K 


PURPOSE 

Program  FFAC2K  (Fractional  Factorial  Experiments  with  2  levels  of  each  of  l£  factors) 
computes  estimates  of  the  factorial  effects  and  the  analysis  of  variance  (anova)  table  for 
factorial  experiments  of  the  2*  configuration.  Factorial  experiments  involve  a  particular 
arrangement  of  factor  level  combinations  in  which  each  level  of  every  factor  is  crossed  with 
each  level  of  every  other  factor.  Factorial  experiments  with  k  factors  at  each  of  two  levels 
are  quite  common  in  experimental  design  and  are  placed  in  the  special  category  of  2*  ex¬ 
periments.  The  classical  “Yates  Procedure”  (References  1  and  2)  is' used  to  perform  the 
computations  and  an  optional  computational  check  on  these  results  is  available  to  the  user. 
Up  to  10  factors  may  be  considered  for  each  full  factorial  experiment. 

The  program  also  has  the  capability  to  provide  for  analysis  of  fractional  replication  of 
factorial  experiments  of  the  2k  type.  Fractional  replication  of  a  factorial  experiment  re¬ 
quires  less  experimentation  than  the  corresponding  full  factorial.  However,  this  results  in 
confounding  or  aliasing  of  factorial  effects  in  the  analysis  of  variance  table.  Confounding 
(aliasing)  means  that  two  or  more  effects  are  estimated  by  the  same  arithmetic  function  of 
the  response  data.  In  order  to  design  or  analyze  fractional  experiments  one  must  be  aware 
of  the  alias  structure  of  the  factorial  effects.  The  program  FFAC2K  can  generate  the  alias 
structure  based  on  the  user  supplied  fundamental  identity  (sometimes  called  a  defining 
contrast)  associated  with  a  particular  fractional  design. 

References  1,  2  and  3  include  discussions  of  factorial  and  fractional  factorial  experi¬ 
ments  and  their  analyses.  Many  designs  of  fractional  factorials  of  the  2*  configuration  are 
given  in  Reference  4. 


FEATURES 

The  output  features  of  FFAC2K  include  the  following: 

*  Printout  of  the  input  observations  of  the  response  variable  and  the  associated 
factor  level  combination. 

*  Optional  computational  check  on  the  Yates  Procedure. 

*  Printout  of  the  analysis  of  variance  table  including  for  each  factorial  effect,  the 
estimated  effect,  the  sum  of  squares,  the  F  statistic  and  the  probability  level  at 
which  the  effect  is  significant. 

*  The  sum  of  squares,  degrees  of  freedom  and  mean  square  for  the  experimental 
error. 

*  The  defining  contrast  and  alias  structure  as  optional  printout. 
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INPUT  GUIDE 


The  specifications  of  the  user-created  input  file  are  given  below.  Record  types  1,  2,  3 
and  either  4  or  5  are  mandatory.  Record  type  4  contains  the  observations  of  the  response 
variable  and  must  be  included  if  the  program  is  being  used  for  analysis  of  either  a  full  or 
fractional  factorial  experiment.  Record  type  5  contains  the  defining  contrast  and  must  be 
included  if  the  alias  structure  of  a  fractional  experiment  is  desired.  If  both  the  analysis  and 
alias  structure  are  desired  than  both  record  types  4  and  5  must  be  included. 

Record 

-Typ-g-  Variable  _ Description _  Columns  Format 

1  ID  Problem  description.  1-72  9A8 

2  NFAC  Number  of  experimental  factors.  In  a  full  2*  1-2  12 

factorial  this  value  is  k.  For  a  fractional 
factorial,  such  as  a  (l/2)**m  replicate  of  a  2* 
factorial,  this  value  is  k-m.  See  the  COM¬ 
MENTS  section. 


(NFAC  <  10) 

NREP  Number  of  replications  of  the  full  or  frac-  3-6  14 

tional  factorial  experiment. 

(NREP  <  100) 

If  only  the  alias  structure  is  desired,  NREP 
must  be  0  or  blank  and  ALS  (columns  7-8) 
must  be  1.  In  this  case  the  defining  contrast 
(record  type  5)  must  be  included  and  the 
response  data  (record  type  4)  must  be 
omitted. 
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ALS  =0,  Alias  structure  will  not  be  generated  7-8  12 

with  the  analysis.  The  defining  con¬ 
trast  (record  type  5)  must  be  omitted. 

=  1,  Alias  structure  will  be  generated  with 
the  analysis.  The  defining  contrast 
(record  type  5)  must  be  included.  If 
only  the  alias  structure  is  desired 
(NREP  =  0),  a  one  must  be  entered 
here. 

NYP  =0,  Computational  check  on  the  Yates  9-10  12 

Procedure  will  not  be  done. 

=1,  Computational  check  on  the  Yates 
Procedure  will  be  done. 

NFN  The  number,  m,  in  the  expression,  (l/2)**m,  1 1-12  12 

which  represents  the  amount  of  fractiona¬ 
tion.  For  example,  enter  0  for  full 
replication:  (1/2)**0=1;  1  for  1/2  replicate: 

(1/2)**  1=1/2;  2  for  1/4  replicate: 

(l/2)**2=l/4;  etc. 

FMT  Format  (in  parentheses)  for  reading  response  17-80  8A8 

data. 

FAC(l)  Letter  assigned  to  the  first  factor.  1  A1 

FAC(2)  Letter  assigned  to  the  second  factor.  2  A1 


FAQ  10)  Letter  assigned  to  the  tenth  factor.  10  A1 

The  NFAC  letters  must  be  entered  in  alphabetical  order.  NFAC  £  10.  See  the 
COMMENTS  section  with  respect  to  assignment  of  letters  to  factors  in  the  case 
of  fractional  replication. 

If  NREP  >  0  (on  record  type  2),  record  type  4  must  be  included. 
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4  T(j),  Enter  the  data  observations  according  to 

j=l,...,  FMT.  Data  must  be  in  the  “standard  order” 
NREP  in  the  context  of  factorial  experiments.  See 
COMMENTS  section.  In  the  case  of  mul¬ 
tiple  replications,  all  observations  from  the 
same  factor  level  combination  must  be  en¬ 
tered  consecutively.  Then  the  observations 
for  the  factor  level  combination  next  in  line 
with  respect  to  the  “standard  order”  are  en  ■ 
tered,  etc. 


FMT 


If  NREP  =  0  and  ALS  =  1  (on  record  type  2),  record  type  5  must  be  included. 


NDC 

Number  of  effects  in  the  defining  contrast. 
See  COMMENTS  section  for  discussion  of 
effect  and  defining  contrast. 

1-3 

13 

DEFCON(l) 

First  effect  of  the  defining  contrast  (left 
justified). 

5-20 

2A8 

DEFCON(2) 

Second  effect  of  the  defining  contrast  (left 
justified). 

22-37 

2A8 

DEFCON(3) 

Third  effect  of  the  defining  contrast  (left 
justified). 

39-54 

2A8 

DEFCON(4) 

Fourth  effect  of  the  defining  contrast  (left 
justified). 

56-71 

2A8 

If  NDC  >  4,  additional  effects  are  indicated  on  subsequent  records  according  to  the  format 
(2A8,  3(1X,2A8)),  i.e.,  fifth,  sixth,  seventh  and  eighth  effects  beginning  in  columns  1,  18, 
35  and  52,  respectively,  etc.  Within  each  effect  the  factor  letters  must  be  arranged  alpha¬ 
betically,  e.g.,  AB  rather  than  BA.  All  effects,  must  be  left  justified. 


COMMENTS 

In  2k  factorial  experiments,  k  factors,  each  at  two  levels,  are  crossed  with  each  other  to 
produce  2k  factor  level  combinations.  Upper  case  letters  are  used  to  name  the  factors. 
Factor  level  combinations  are  designated  by  using  corresponding  lower  case  letters.  The 
presence  of  a  lower  case  letter  in  the  designation  of  a  factor  level  combination  indicates  the 
corresponding  factor  is  at  the  “high”  level.  The  lower  case  letter  is  omitted  from  the  des¬ 
ignation  if  the  factor  is  at  the  “low”  level. 

Consider  an  experiment  with  four  factors  represented  by  A,  B,  C  and  D.  There  are  24 
=  16  factor  level  combinations  in  this  experiment.  In  the  “standard  order”  of  these  16 
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combinations  the  first  combination  has  all  factors  at  the  “low”  level  and  is  conventionally 
designated  by  (1).  The  second  combination  is  obtained  by  introducing  factor  A  at  the 
“high”  level.  Multiplying  the  first  factor  level  combination  by  factor  A  at  the  “high”  level 
(1  x  a)  generates  the  second  factor  level  combination,  a,  in  the  “standard  order”.  (For 
combination  “a”  all  factors  are  at  the  “low”  level  except  factor  A.)  The  next  two  factor 
level  combinations  are  obtained  by  introducing  factor  B  at  the  “high”  level.  Multiplying 
combinations  one  and  two  by  factor  B  at  the  “high”  level  (1  x  b  and  a  x  b)  generates 
combinations  b  and  ab.  Proceeding  similarly  with  factor  C,  multiplying  combinations  (1), 
a,  b  and  ab  each  by  c  generates  combinations  c,  ac,  be  and  abc.  After  completing  the 
generation  scheme  with  factor  D,  the  complete  list  of  factor  level  combinations  in  “standard 
order”  becomes  (1),  a,  b,  ab,  c,  ac,  be,  abc,  d,  ad,  bd,  abd,  cd,  acd,  bed  and  abed.  The 
response  data  must  be  entered  into  the  program  in  this  order. 

For  fractional  factorial  experiments  the  number  of  treatment  combinations  is  reduced 
by  the  amount  of  fractionation.  For  example,  in  a  1/2  replicate  of  a  25  experiment,  the 
number  of  factor  level  combinations  performed  will  be  1  /  2  x  2s  =  25'1  =  24  =  16,  rather  than 
2s  =  32,  the  number  required  by  a  non-fractionated  experiment.  Effectively,  this  reduces 
the  experiment  to  the  equivalent  of  a  pseudo  24  full  factorial  experiment.  When  using  the 
program  to  analyze  fractional  data,  one  must  consider  this  reduction  of  factor  level 
combinations  in  the  preparation  of  the  input  file.  Obviously,  the  input  parameters  NFAC 
and  FAC(i)  are  affected  but  the  “standard  order”  of  factor  level  combinations  is  also  af¬ 
fected,  as  is  discussed  below. 

Using  a  half-replicate  of  a  24  experiment  with  the  four-factor  interaction  effect,  ABCD, 
as  the  defining  contrast  for  an  example,  NFAC  =  3  and  FAC(l),  FAC(2)  and  FAC(3)  equal 
A,  B  and  C,  respectively.  The  16  treatment  combinations  of  a  full  24  factorial  were  listed 
in  “standard  order”  above.  Only  8  of  these  treatment  combinations  would  be  performed 
in  a  half-replicate  of  a  24  experiment.  By  use  of  the  defining  contrast  the  16  combinations 
can  be  divided  into  two  half-replicates  of  8  combinations  each.  For  the  interested  user  the 
algorithm  for  doing  this  is  given  in  Reference  2  on  pages  247  -  253.  The  analyst  does  not 
have  to  apply  the  algorithm,  however,  because  the  appropriate  factor  combinations  to  be 
performed  are  given  in  the  design  references  such  as  Reference  4.  The  two  half-replicates 
for  the  example  are  given  below: 

Half-rep  1 :  a,  b,  c,  abc,  d,  abd,  acd,  bed 

Half-rep  2:  (1),  ab,  ac,  be,  ad,  bd,  cd,  abed 

Either  set  is  a  candidate  for  a  fractional  factorial.  However,  in  both  sets  a  reordering  must 
take  place  to  conform  to  the  “standard  order”  required  by  the  program.  Because  the  anal¬ 
ysis  is  being  done  as  a  full  replicate  of  a  pseudo  23  experiment  with  factors  A,  B  and  C,  th? 
factor  level  combinations  must  be  reordered  ignoring  factor  D.  Therefore,  the  “standard 
order”  for  each  of  the  half-replicates  becomes: 
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Half-rep  1:  (d),  a,  b,  ab(d),  c,  ac(d),  bc(d),  abc 
Half-rep  2:  (1),  a(d),  b(d),  ab,  c(d),  ac,  be,  abc(d) 

The  factor  level  designator  (d)  has  been  placed  in  parentneses  to  indicate  that  it  should  not 
be  considered  in  the  construction  of  the  “standard  order”. 

Some  additional  remarks  are  necessary  with  respect  to  the  defining  contrast.  In  a 
full  2*  factorial  there  are  2*  -  1  experimental  effects  that  can  be  estimated.  These  effects 
can  be  designated  by  factor  letters  and  combinations  of  factor  letters.  In  the  example  of  a 
full  24  factorial  the  15  effects  are  the  main  effects,  A,  B,  C  and  D;  the  two-factor  interac¬ 
tions,  AB,  AC,  AD,  BC,  BD  and  CD;  the  three-factor  interactions,  ABC,  ABD,  ACD  and 
BCD;  and  the  four-factor  interaction,  ABCD.  To  define  a  half-replicate  one  of  these  effects 
must  be  chosen  as  the  defining  contrast  and  in  the  example,  the  ABCD  effect  was  selected. 
The  alias  structure  can  then  be  generated  by  obtaining  the  “generalized  interaction”  of  each 
experimental  effect  with  the  defining  contrast.  The  generalized  interaction  of  two  effects 
is  the  algebraic  product  of  their  letter  designations  where  the  exponents  are  reduced  to  the 
modulo  2  base.  The  alias  of  the  effect  A  in  the  example  is  its  generalized  interaction  with 
ABCD  which  is  A2BCD  or  simply  BCD.  The  alias  of  BCD,  obviously,  is  A  which  is 
equivalent  to  the  generalized  interaction  AB^D2. 

For  fractionation  beyond  half-replication,  the  generaFzed  interaction  is  useful  in 
identifying  all  effects  in  the  defining  contrast.  For  example,  consider  a  quarter-replicate  of 
a  26  factorial  with  factors  A,  B,  C,  D,  E  and  F.  In  order  to  .ubdivide  the  26  =  64  treatment 
combinations  into  four  quarter-replicates  each  containing  16  combinations,  two  effects 
must  be  chosen  for  the  defining  contrast.  However,  a  third  effect,  defined  by  the  general¬ 
ized  interaction  of  the  chosen  two  effects,  automatically  becomes  a  part  of  the  defining 
contrast.  If  ABCE  and  ABDF  are  selected  as  the  defining  contrast  then  A2B2CDEF  = 
CDEF  also  becomes  part  of  the  defining  contrast. 

Additionally,  care  must  be  taken  when  assigning  letters  to  factors  on  Record  Type  3 
for  fractional  replication.  The  assignment  of  letters  must  be  such  that  it  does  not  produce 
a  factorial  effect  name  which  is  identical  with  any  of  the  effects  comprising  the  defining 
contrast.  This  would  generate  an  incomplete  alias  structure  and,  therefore,  is  flagged  by 
the  program.  In  this  instance  the  user  is  advised  to  consider  a  different  letter  assignment 
or  a  different  defining  contrast. 

An  example  may  help  the  user  understand  this  restriction  on  naming  the  factors. 
Consider  a  1/4  replicate  of  a  25  factorial  to  be  analyzed  as  a  pseudo  23  factorial.  Assume 
a  letter  assignment  of  (A,  B,  C)  for  the  three  factors  of  the  pseudo  23  factorial  and  (D,  E) 
for  the  two  remaining  factors  introduced  by  the  defining  contrast  of 

I  =  ABDE  =  BCDE  =  AC. 

Without  the  flag  the  program  would  generate  the  following  incomplete  alias  structure  in 
terms  of  the  seven  factorial  effects  from  the  pseudo  23  factorial: 


208 


NSWC  TR  89-97 


A  =  BDE  =  ABCDE  =  C  AB  =  DE  =  ACDE  =  BC 

B  =  ADE  =  CDE  =  ABC  AC  =  BCDE  =  ABDE  =  I 

C  =  ABCDE  =  BDE  =  A  BC  =  ACDE  =  DE  =  AB 

ABC  =  CDE  =  ADE  =  B 

Several  alias  relationships  are  missing  from  this  alias  structure  while  others  are  duplicated. 
In  this  situation  the  user  is  instructed  to  redo  the  input  file.  The  user  may  choose  to  rename 
the  factors  or  specify  a  different  defining  contrast.  A  suggested  choice  for  this  example  is 
to  retain  A  and  B  while  replacing  C  with  D  for  the  three  factors  of  the  pseudo  23  factorial. 
The  factors  C  and  E  are  then  introduced  by  the  given  defining  contrast.  This  naming 
convention  in  conjunction  with  the  same  defining  contrast  yields  the  following  alias 
structure: 

A  =  BDE  =  ABCDE  =  C  AB  =  DE  =  ACDE  =  BC 

B  =  ADE  =  CDE  =  ABC  AD  =  BE  =  ABCE  =  CD 

D  =  ABE  =  BCE  =  ACD  BD  =  AE  =  CE  =  ABCD 

ABD  =  E  =  ACE  =  BCD 

The  original  alias  relationships  have  been  retained  while  providing  for  the  previously 
missing  ones  to  be  generated. 
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SUBROUTINES 
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RANDOM  NUMBER  GENERATION 

The  property  of  randomness  is  a  key  element  in  many  areas  of  scientific  research  and 
application.  Random  numbers  generated  on  a  digital  computer  are  used  in  several  ways  in 
addition  to  sampling  from  specific  distributions  or  populations.  Other  uses  include  simu¬ 
lation  studies  in  which  the  behavior  of  a  system  that  contains  random  components  is 
modeled  and  computer  program  checkout  in  which  combinations  of  input  parameters  used 
to  test  the  code  are  randomly  selected. 

When  speaking  of  random  numbers  one  is  usually  referring  to  a  sequence  of  numbers 
which  obeys  some  probability  law;  for  example,  the  sequence  Xj,  X2,...,  Xn  might  represent 
n  independent  numbers  drawn  from  a  continuous  uniform  distribution  over  the  interval 
(a,b).  These  U(a,b)  numbers  are  referred  to  as  “uniform  random  numbers”  or  “uniform 
random  variates.”  The  need  for  random  numbers  makes  the  availability  of  a  readily  ac¬ 
cessible  set  of  algorithms  for  generating  random  numbers  from  a  wide  variety  of  probability 
distributions  on  a  digital  computer  quite  desirable.  STATLIB  provides  such  a  facility 
through  inclusion  of  a  set  of  twenty-four  subroutines  for  random  number  generation  from 
both  discrete  and  continuous  distributions.  While  this  set  is  not  exhaustive,  it  does  include 
virtually  all  of  the  common  distributions  required  for  most  user  purposes.  The  basis  for  the 
generation  scheme  used  in  each  of  these  subroutines  is  an  algorithm  for  generating  a  se¬ 
quence  of  independent  uniform  random  variates  on  the  interval  (0,1).  Probability  theory 
establishes  the  fact  that  variates  can  be  generated  from  a  large  number  of  distributions 
provided  that  such  a  sequence  can  be  generated.  The  random  number  subroutines  in 
STATLIB  employ  subroutine  URNG,  currently  available  on  the  CDC  system  in  the 
MATHLIB  library,  to  produce  the  required  U(0,1)  variates. 

Documentation  on  each  of  the  random  number  generation  subroutines  in  STATLIB 
follows  in  the  ensuing  pages.  Each  subroutine  is  designed  to  enable  the  user  to  generate  a 
set  of  random  variates  from  the  desired  distribution.  In  the  description  of  each  subroutine 
the  functional  form  of  the  probability  distribution  in  question  is  displayed  whenever 
deemed  appropriate.  Oftentimes  there  is  more  than  one  accepted  functional  form  in  the 
literature.  In  such  cases  only  one  form  has  been  selected  and  displayed.  In  addition,  the 
mean  and  variance  of  each  distribution  is  given  in  most  cases  to  allow  the  user  to  check  the 
sample  mean  and  variance  from  the  generated  set  of  random  variates  against  their  theo¬ 
retical  values  if  so  desired. 

In  the  call  line  for  each  subroutine  FORTRAN  naming  conventions  have  been  followed 
for  all  variables.  Hence,  variables  whose  first  letter  is  in  the  range  1  to  N  are  integers  while 
all  other  variables  are  real.  In  the  input  guide  for  each  subroutine  the  call  line  is  listed 
twice,  once  on  a  single  line  and  once  in  currently  accepted  structured  programming  form. 
For  example,  the  call  line  for  subroutine  RANUWO  is  CALL  RANUWO  (N,IA,IB,K,I- 
SEED,IC,IX,IERROR),  while  in  structured  programming  form  it  appears  as 
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CALL  RAN  U WO 

G  ( N  ,  IA  ,  IB  ,  K 

B  ,  ISEED  ,  1C 

Y  ,  IX  1ERROR  ) 

This  structure  is  used  in  order  to  facilitate  the  identification  of  those  arguments  for  which 
the  user  must  provide  input  values.  There  are  three  categories  of  arguments:  given  (G), 
both  (B),  and  yielded  (Y).  The  given  (G)  arguments  require  input  values  provided  by  the 
user.  These  argument  values  will  not  change  during  execution.  The  yielded  (Y)  arguments 
identify  output  values  returned  by  the  subroutine.  Hence,  they  represent  created  values  and 
as  such  require  no  user  action.  The  both  (B)  category  refers  to  arguments  whose  values 
are  modified  during  execution  of  the  subroutine.  Hence,  these  arguments  may  or  may  not 
require  the  user  to  input  values.  The  input  guide  provided  with  each  subroutine  description 
clarifies  which  of  these  arguments  require  input  and  which  do  not.  For  example,  in  sub¬ 
routine  RANUWO  a  value  for  the  argument  ISEED  must  be  input  by  the  user,  while  no 
input  is  required  for  the  argument  IC.  IC  is  an  array  of  dimension  K  whose  values  are 
initialized  to  zero  within  RANUWO  and  then  updated  repeatedly  before  execution  is 
completed.  IC  is  included  in  the  argument  list  so  that  its  size  does  not  have  to  be  restricted 
in  the  subroutine  dimension  statement. 

Two  arguments  which  are  common  to  all  of  the  random  number  generation  subroutines 
require  some  discussion.  The  last  argument  in  each  subroutine  call  line  is  an  input  error 
flag.  When  a  subroutine  is  called,  if  no  input  errors  are  detected  then  this  argument  is  set 
to  zero.  If  an  input  error  is  detected,  the  value  of  this  argument  indicates  which  specific 
input  error  has  been  made.  The  argument  ISEED  is,  on  input,  an  integer  “seed”  used  to 
initialize  a  sequence  of  U(0,1)  variates  generated  by  subroutine  URNG.  Recall  that  each 
random  number  generation  subroutine  in  STATLIB  utilizes  these  U(0,1)  variates  in  its 
generation  scheme.  On  output,  ISEED  is  a  new  seed  available  for  generating  additional 
U(0,1)  variates  from  subroutine  URNG.  The  input  value  for  ISEED  must  be  such  that  1  < 
ISEED  <  231  -  1.  A  given  value  of  ISEED  always  initiates  the  same  set  of  U(0,1)  variates. 
Note  that  it  is  feasible  to  specify  only  one  input  value  of  ISEED  in  a  main  program  in  order 
to  generate  successive  blocks  of  random  variates  from  different  STATLIB  subroutines.  For 
example,  suppose  a  main  program  requires  25  gamma  random  variates  followed  by  50 
normal  random  variates.  One  possible  set  of  instructions  would  be 

ISEED  =  5437 

CALL  RANGAM  (25,A,B,ISEED,X,IERRORX) 

CALL  RANNOR  (50,FMU,SIG,ISEED,Y,IERRORY) 

The  value  of  ISEED  is  updated  each  time  subroutine  URNG  is  called  by  subroutines 
RANGAM  and  RANNOR.  The  last  of  these  updated  values  from  subroutine  RANGAM 
will  be  the  first  ISEED  value  used  by  subroutine  RANNOR. 
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One  final  comment  regarding  random  number  generation  is  in  order.  The  sequence  of 
U(0,1)  variates  used  in  generating  variates  from  other  distributions  are  themselves  produced 
using  the  arithmetic  operations  of  a  digital  computer  in  a  recursive  scheme.  (Subroutine 
URNG  is  based  on  a  multiplicative  congruential  generation  scheme.)  As  such,  once  the  seed 
value  is  specified,  the  U(0,1)  sequence  is  completely  determined.  Such  sequences  are  not 
really  random,  but  they  appear  to  be,  and  are  in  fact  referred  to  as  pseudo-random.  They 
are  called  random  with  the  above  understanding.  For  typical  applications  these  determin¬ 
istic  sequences  may  be  considered  random  if  they  satisfy  certain  statistical  properties  of 
randomness.  Subroutine  URNG  has  been  thoroughly  tested  and  clearly  exhibits  all  of  the 
critical  randomness  properties.  It  turns  out  that  the  random  number  generation  subroutines 
included  in  STATLIB,  all  of  which  utilize  U(0,1)  numbers  from  subroutine  URNG,  also 
satisfy  these  randomness  properties. 
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RANARB 


PURPOSE 

Subroutine  RANARB  generates  n  random  variates  from  an  arbitrarily  specified  dis¬ 
crete  distribution  with  replacement.  The  user  specifies  the  values  of  the  random  variable 
and  their  associated  probabilities  of  occurrence.  If  a  discrete  distribution  assumes  the 
values  xu  x2,  ...  ,  x„  with  corresponding  probabilities  pu  p2,  ...  ,  p„  where  2£=I  p,  =  1,  then 
the  form  of  the  distribution  is 

p(*,)  =  A  ,  1  =  1,2,...,#* 

The  values  of  the  random  variable  may  be  integer  and/or  real,  but  they  are  treated  as  reals 
in  RANARB. 


FEATURES 

The  input  arrays  of  variate  values  and  their  associated  probabilities  are  simultaneously 
reordered  in  RANARB  to  make  the  generation  scheme  computationally  efficient. 


REFERENCE 

1.  Knuth,  D.  E.  (1981),  The  Art  of  Computer  Programming ,  Second  Edition, 
Addison-Wesley,  Volume  2  /  Seminumerical  Algorithms,  p.  115  and  Volume  1  / 
Fundamental  Algorithms,  pp.  399  -  404. 


INPUT  GUIDE 

The  call  line  for  subroutine  RANARB  is 
CALL  RANARB  (N,K,Xl,P,ISEED,X,XL,IERROR) 
or,  in  structured  programming  form, 

CALL  RANARB 


G 

(N 

,  K 

B 

,X1 

,P 

,  ISEED 

Y 

,x 

.XL 

;  TERROR  ) 
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The  parameter  list  is  given  below: 

Given  arguments 

N  :  Number  of  variates  to  be  generated 

(N  >  0) 

K  :  Number  of  values  that  the  discrete  random  variable  can  as¬ 

sume 
(K  >  0) 

Both 

X1(K)  :  Input  array  of  dimension  K  containing  the  values  of  the  dis¬ 

crete  random  variable 

P(K)  :  Input  array  of  dimension  K  containing  the  probabilities  with 

which  the  discrete  random  variable  can  assume  each  of  its  K 
possible  values 
(0.0  <  P(I)  <  1.0,  I  =  1,K) 

ISEED  :  Integer  seed 

(1  <  ISEED  <  231  - 1) 

Yielded  arguments 

X(N)  :  Output  array  of  dimension  N  containing  the  generated  vari¬ 

ates 

XL(K)  :  Auxiliary  array  of  dimension  K  containing  cumulative  sums 
of  the  P(I)’s  used  in  the  generation  scheme 

IERROR  :  Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Number  of  variate  values  out  of  range 

3  :  At  least  one  probability  out  of  range 

4  :  Probabilities  fail  to  sum  to  unity 

5  :  Seed  out  of  range  ) 

COMMENTS 

The  user  must  take  care  to  ensure  that  the  input  probabilities  sum  to  one  to  yield  a  valid 
probability  distribution. 
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RANBER 


PURPOSE 

Subroutine  RANBER  generates  n  Bernoulli  random  variates  (i.e.,  0’s  and  l’s)  from  a 
Bernoulli  distribution  with  parameter  p  (probability  of  success  on  a  single  trial).  The  form 
of  the  Bernoulli  distribution  used  is 

p(x)=px(i-pY~z  ,x  =0,1 

with 


and 


mean  =  p 
variance  =  p(l-p) 


REFERENCE 

1.  Fishman,  G.  S.  (1973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation , 
John  Wiley  &  Sons,  Inc.,  pp.  168  -  169. 

INPUT  GUIDE 

The  call  line  for  subroutine  RANBER  is 
CALL  RANBER  (N,P,ISEED,IX,IERROR) 
or,  in  structured  programming  form, 

CALL  RANBER 

G  (N  ,  P 

B  ,  ISEED 

Y  ,  IX  IERROR ) 

The  parameter  list  is  given  below: 

Given  arguments 

N  :  Number  of  variates  to  be  generated 

(N  >  0) 
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P 

Both 

ISEED 

Yielded  arguments 
IX(N) 

EERROR 


Probability  of  success  on  a  single  trial 
(0.0<P<1.0) 


Integer  seed 

(1  <  ISEED  <231-1) 


Output  array  of  dimension  N  containing  the  generated  Ber¬ 
noulli  variates 

Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Probability  of  success  out  of  range 

3  :  Seed  out  of  range  ) 


I 
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RANBIN 


PURPOSE 

Subroutine  RANBIN  generates  n  binomial  random  variates  from  a  binomial  distribu¬ 
tion  with  parameters  nt  (number  of  trials)  and  p  (probability  of  success  on  a  single  trial). 
The  form  of  the  binomial  distribution  used  is 

p(x)  =  ^\x(l -pT~x  ,x  =  0, \,...,nt 

The  mean  and  variance  of  the  binomial  are 
mean  -  nt  ■  p 
variance  =  nt  ■  p(l-p) 


FEATURES 

The  generation  scheme  used  in  RANBIN  is  in  three  parts.  If  the  number  of  trials  (nt) 
is  less  than  or  equal  to  100,  the  generation  scheme  is  based  on  summing  Bernoulli  variates. 
If  nt  is  greater  than  100,  the  generation  scheme  is  based  on  either  the  normal  approximation 
to  the  binomial  or  the  Poisson  approximation  to  the  binomial.  If  nt  ■  p  is  greater  than  5  and 
p  is  less  than  or  equal  to  0.5  or  if  nt(  1  -  p)  is  greater  than  5  and  p  is  greater  than  0.5,  the 
normal  approximation  is  used.  Otherwise  the  Poisson  approximation  is  employed. 


REFERENCES 

1.  Fishman,  G.  S.  (1973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation, 
John  Wiley  &  Sons,  Inc.,  p.  220,  pp.  211  -  213,  and  p.  224. 

2.  Walpole,  R.  E.  and  Myers,  R.  H.  (1985),  Probability  and  Statistics  for  Engineers  and 
Scientists,  Third  Edition,  Macmillan,  p.  126. 


INPUT  GUIDE 

The  call  line  for  subroutine  RANBIN  is 
CALL  RANBIN  (N,NT,P,ISEED, LX, ERROR) 
or,  in  structured  programming  form, 
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CALL  RANBIN 

( N  ,  NT  ,  P 

,  ISEED 

,  IX  ,  IERROR  ) 

The  parameter  list  is  given  below: 

Given  arguments 

N  :  Number  of  variates  to  be  generated 

(N  >  0) 

NT  :  Number  of  trials 

(NT  >  0) 

P  :  Probability  of  success  on  a  single  trial 

(0.0  <P<  1.0) 

Both 

ISEED  :  Integer  seed 

(1  <  ISEED  <231-1) 

Yielded  arguments 

IX(N)  :  Output  array  of  dimension  N  containing  the  generated  bino¬ 
mial  variates 

IERROR  :  Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Number  of  trials  out  of  range 

3  :  Probability  of  success  out  of  range 

4  :  Seed  out  of  range  ) 
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RANGEO 


PURPOSE 

Subroutine  RANGEO  generates  n  geometric  random  variates  from  a  geometric  distri¬ 
bution  with  parameter  p  (probability  of  success  on  a  single  trial).  The  form  of  the  geometric 
distribution  used  is 

p(x)  =  p(\ -p)x~l  ,  x  =  1,2,3,... 

with 


and 


mean  =  1  Ip 


variance  =  (1  -  p)l  p2 


REFERENCE 

1 .  Knuth,  D.  E.  ( 1 98 1 ),  The  Art  of  Computer  Programming,  Volume  2  /  Seminumerical 
Algorithms,  Second  Edition,  Addison-Wesley,  p.  131. 


INPUT  GUIDE 

The  call  line  for  subroutine  RANGEO  is 
CALL  RANGEO  (N,P,ISEED,IX,IERROR) 
or,  in  structured  programming  form, 

CALL  RANGEO 

G  (N  ,P 

B  ,  ISEED 

Y  ,  IX  IERROR ) 

The  parameter  list  is  given  below: 

Given  arguments 

N  :  Number  of  variates  to  be  generated 

(N  >  0) 


NSWC  TR  89-97 


P  :  Probability  of  success  on  a  single  trial 

(0.0  <  P  <  1.0) 

Both 

ISEED  :  Integer  seed 

(1  <  ISEED  <231-1) 

Yielded  arguments 

IX(N)  :  Output  array  of  dimension  N  containing  the  generated  geo¬ 
metric  variates 

DERROR  :  Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Probability  of  success  out  of  range 

3  :  Seed  out  of  range  ) 
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RANHYP 


PURPOSE 


Subroutine  RANHYP  generates  n  hypergeometric  random  variates  from  a  hypergeo¬ 
metric  distribution  with  parameters  NP  (finite  population  size),  M  (number  of  successes  in 
the  finite  population),  and  ns  (number  of  items  sampled  from  the  finite  population).  The 
hypergeometric  distribution  is  appropriate  when  sampling  without  replacement  fiom  a  fi¬ 
nite  population.  The  form  of  the  hypergeometric  distribution  used  is 


P  (*)  = 


rM) 

' np-m\ 

UJ 

,  ns -x  ) 

(NP\ 

,x  =  max(0,/u  -NP  +  M),...,min(M,ns) 


ynsj 


The  mean  and  variance  of  the  hypergeometric  are 
mean  =  (ns  ■  M)/NP 


variance  =  (NP -ns)/(NP  -  1)  •  ns  ■  (M /NP)  •  (1  - (M  /NP)) 


REFERENCE 

1 .  Fishman,  G.  S.  (1973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation, 
John  Wiley  &  Sons,  Inc.,  p.  228. 

INPUT  GUIDE 

The  call  line  for  subroutine  RANHYP  is 
CALL  RANHYP  (N,NP,M,NS,ISEED,IX,IERROR) 
or,  in  structured  programming  form, 

CALL  RANHYP 

G  (N  , NP  ,  M  ,  NS 

B  ,  ISEED 

Y  ,  IX  IERROR ) 

The  parameter  list  is  given  below: 
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Given  arguments 
N 

NP 

M 

NS 


Both 

ISEED 

Yielded  arguments 
IX(N) 

IERROR 


Number  of  variates  to  be  generated 
(N  >  0) 

Size  of  the  finite  population  from  which  a  sample  of  size  NS 
is  to  be  taken 
(NP  >  0) 

Number  of  success  items  in  the  finite  population 
(0  <  M  <  NP) 

Numper  of  items  to  be  sampled  from  the  finite  population 
(0  <  NS  <  NP) 


Integer  seed 

(1  <  ISEED  <23l-l) 


Output  array  of  dimension  N  containing  the  generated  hy¬ 
pergeometric  variates 

Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Finite  population  size  out  of  range 

3  :  Number  of  success  items  out  of  range 

4  :  Sample  size  out  of  range 

5  :  Seed  out  of  range  ) 
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RANNBI 


PURPOSE 


Subroutine  RANNBI  generates  n  negative  binomial  random  variates  from  a  negative 
binomial  distribution  with  parameter  p  (probability  of  success  on  a  single  trial).  The  form 
of  the  negative  binomial  distribution  used  is 

„  \x~k  x=k,k  +  l,k  +  2,... 

s-i r1_p)  ■  *=i,2,3,... 


p(x)  = 


The  mean  and  variance  of  the  negative  binomial  are 
mean  =  kip 
variance  =  £(1  - p)lp 2 


REFERENCES 

1.  Fishman,  G.  S.  (1973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation, 
John  Wiley  &  Sons,  Inc.,  p.  226. 

2.  Walpole,  R.  E.  and  Myers,  R.  H.  (1985),  Probability  and  Statistics  for  Engineers  and 
Scientists,  Third  Edition,  Macmillan,  pp.  121  -  122. 


INPUT  GUIDE 

The  call  line  for  subroutine  RANNBI  is 


CALL  RANNBI  (N,P,K,ISEED, IX, ERROR) 
or,  in  structured  programming  form, 

CALL  RANNBI 

G  (N  ,P  ,  K 

B  ,  ISEED 

Y  ,  IX  ERROR ) 

The  parameter  list  is  given  below: 
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Given  arguments 
N 

P 

K 


Both 

ISEED 

Yielded  arguments 
IX(N) 

IERROR 


Number  of  variates  to  be  generated 
(N  >0) 

Probability  of  success  on  a  single  trial 
(0.0<P<1.0) 

Number  of  successes  required 
(K  >  0) 


Integer  seed 

(1  <  ISEED  <231-1) 


Output  array  of  dimension  N  containing  the  generated  nega¬ 
tive  binomial  variates.  Each  IX  value  represents  the  number 
of  the  trial  on  which  the  Kth  success  occurs. 

Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Probability  of  success  out  of  range 

3  :  Number  of  successes  out  of  range 

4  :  Seed  out  of  range  ) 
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RANPOI 


PURPOSE 


Subroutine  RANPOI  generates  n  Poisson  random  variates  from  a  Poisson  distribution 
with  parameter  Jmu  (fmu  >  0).  fmu  represents  the  average  rate  per  time  (or  space)  interval. 
The  form  of  the  Poisson  distribution  used  is 


P(x)  = 


e~fnufmux 

x\ 


x  =0, 1,2, ... 
fmu  >  0 


The  mean  and  variance  of  the  Poisson  are 


mean  =  fmu 
variance  =  fmu 


REFERENCE 

1 .  Fishman,  G.  S.  (1973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation , 
John  Wiley  &  Sons,  Inc.,  p.  224. 


INPUT  GUIDE 

The  call  line  for  subroutine  RANPOI  is 
CALL  RANPOI  (N,FMU,ISEED,IX,IERROR) 
or,  in  structured  programming  form, 

CALL  RANPOI 

G  ( N  FMU 

B  ,  ISEED 

Y  ,  IX  ERROR ) 

The  parameter  list  is  given  below: 

Given  arguments 

N  :  Number  of  variates  to  be  generated 

(N  >  0) 
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FMU 

Both 

ISEED 

Yielded  arguments 
IX(N) 

ERROR 


Average  rate  per  time  (or  space)  interval 
(FMU  >  0) 


Integer  seed 

(1  <  ISEED  <23l-l) 


Output  array  of  dimension  N  containing  the  generated  Pois¬ 
son  variates 

Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Poisson  parameter  FMU  out  of  range 

3  :  Seed  out  of  range  ) 
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RANUWO 

PURPOSE 

Subroutine  RANUWO  generates  a  randomly  ordered  subset  of  n  of  the  integers  in  the 
interval  IA  to  IB  inclusive  (M  <  IB).  All  integers  in  the  resultant  random  ordering  are 
distinct;  that  is,  the  integers  are  generated  without  replacement. 

INPUT  GUIDE 

The  call  line  for  subroutine  RANUWO  is 
CALL  RANUWO  (N,IA,IB,K,ISEED,IC,IX,IERROR) 
or,  in  structured  programming  form, 

CALL  RANUWO 

G  ( N  , IA  ,  IB  ,  K 

B  ,  ISEED  ,  IC 

Y  ,  IX  IERROR ) 

The  parameter  list  is  given  below: 

Given  arguments 

N  :  Number  of  variates  to  be  generated 

(0  <  N  <  K) 

IA  :  Lower  limit  of  generation  interval 

(IA  <  IB) 

IB  :  Upper  limit  of  generation  interval 

(IB  >  IA) 

K  :  Length  of  generation  interval 

(K  =  IB  -  IA  +  1) 

Both 

ISEED  :  Integer  seed 

(1  <  ISEED  <  231  -  1) 

IC(K)  :  Auxiliary  array  of  dimension  K  used  for  keeping  track  of 
previously  generated  integers 
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Yielded  arguments 

IX(N)  :  Output  array  of  dimension  N  containing  the  generated  inte¬ 
gers 

IERROR  Input  error  flag 

(  0  :  No  input  errors 

1  :  Improper  generation  interval 

2  :  Incorrect  generation  interval  width 

3  :  Number  of  variates  out  of  range 

4  :  Seed  out  of  range  ) 
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RANUWR 

PURPOSE 

Subroutine  RANUWR  generates  n  random  integers  in  the  range  I  A  to  IB  inclusive  (IA 
<  IB)  with  replacement.  All  integers  in  this  range  are  assumed  to  be  equally  likely  to  occur; 
that  is,  generation  is  based  on  the  discrete  uniform  distribution.  The  form  of  the  discrete 
uniform  distribution  used  is 

p(x)  =1  /  {IB  -  IA  +  1)  ,  x  =  IA,  IA+ 1,  M+2, ... ,  IB 
The  mean  and  variance  of  the  discrete  uniform  are 
mean  =  IA+  (IB  -IA )/2 
variance  =  ((IB  -IA  +  if  - 1)/12 

REFERENCE 

1.  Fishman,  G.  S.  (1973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation, 
John  Wiley  &  Sons,  Inc.,  p.  219. 

INPUT  GUIDE 

The  call  line  for  subroutine  RANUWR  is 
CALL  RANUWR  (N,IA,IB,ISEED, IX, ERROR) 
or,  in  structured  programming  form, 

CALL  RANUWR 

G  ( N  ,  IA  ,  IB 

B  ,  ISEED 

Y  ,  IX  ERROR ) 

The  parameter  list  is  given  below: 

Given  arguments 

N  :  Number  of  variates  to  be  generated 

(N  >  0) 
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IA 

IB 

Both 

ISEED 

Yielded  arguments 
IX(N) 

TERROR 


Lower  limit  of  generation  interval 
(IA  <  IB) 

Upper  limit  of  generation  interval 
(IB  >  IA) 


Integer  seed 

(1  <  ISEED  <  23i  —  1) 


Output  array  of  dimension  N  containing  the  generated  dis¬ 
crete  uniform  variates 

Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Improper  generation  interval 

3  :  Seed  out  of  range  ) 
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CONTINUOUS 
RANDOM  NUMBER 
GENERATORS 
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RANBET 


PURPOSE 


Subroutine  RANBET  generates  n  beta  random  variates  from  a  beta  distribution  with 
parameters  a  and  b  ( a  and  b  both  >  0).  The  form  of  the  beta  distribution  used  is 


f(x)  = 


Tja+b)  , 
IW(Z>) 


0<x  <  1 
a  >  0 
b  >0 


The  mean  and  variance  of  the  beta  distribution  are 
mean  =  a/(a+b) 
variance  =  ab  /  ((a  +  bf  {a  +  b  +  1)) 


FEATURES 

The  generation  scheme  used  in  RANBET  is  written  in  three  parts,  keying  upon  whether 
the  parameters  a  and  b  are  integral  or  nonintegral  as  well  as  their  magnitudes,  to  increase 
computational  efficiency. 


REFERENCE 

1.  Fishman,  G.  S.  (1973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation , 
John  Wiley  &  Sons,  Inc.,  pp.  204  -  208  and  pp.  209  -  211. 


INPUT  GUIDE 

The  call  line  for  subroutine  RANBET  is 
CALL  RANBET  (N,A,B,ISEED,X,IERROR) 
or,  in  structured  programming  form, 

CALL  RANBET 

G  (N  ,  A  ,  B 

B  ,  ISEED 

Y  ,  X  IERROR ) 
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The  parameter  list  is  given  below: 

Given  arguments 

N  :  Number  of  variates  to  be  generated 

(N  >0) 

A  :  Parameter  of  the  beta  distribution 

(A  >  0) 

B  :  Parameter  of  the  beta  distribution 

(B  >  0) 

Both 

ISEED  :  Integer  seed 

(1  <  ISEED  <231-1) 

Yielded  arguments 

X(N)  :  Output  array  of  dimension  N  containing  the  generated  beta 

variates 

IERROR  :  Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Beta  parameter  A  out  of  range 

3  :  Beta  parameter  B  out  of  range 

4  :  Seed  out  of  range  ) 
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RANCSQ 


PURPOSE 

Subroutine  RANCSQ  generates  n  chi-square  random  variates  from  a  chi-square  dis¬ 
tribution  with  parameter  nu  (called  the  degrees  of  freedom  of  the  distribution)  where  nu  is 
a  positive  integer.  The  form  of  the  chi-square  distribution  used  is 

f(  i  T~ 1  0<X<~> 

J[x)  2nu,2r(nul2)X  6  ’  nu  =  1,2,3, ... 

The  mean  and  variance  of  the  chi-square  distribution  are 
mean  =  nu 
variance  =  2  •  nu 


FEATURES 

The  generation  scheme  used  in  RANCSQ  is  based  on  the  relationship  between  the 
gamma  and  chi-square  distributions;  namely,  the  chi-square  with  nu  degrees  of  freedom  is 
equivalent  to  a  gamma  distribution  with  parameters  a  =  null  and  b  -2. 


REFERENCES 

1 .  Fishman,  G.  S.  ( 1 973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation, 
John  Wiley  &  Sons,  Inc.,  p.  213. 

2.  Walpole,  R.  E.  and  Myers,  R.  H.  (1985),  Probability  and  Statistics  for  Engineers  and 
Scientists,  Third  Edition,  Macmillan,  pp.  155  -  156. 


INPUT  GUIDE 

The  call  line  for  subroutine  RANCSQ  is 
CALL  RANCSQ  (N,NU,ISEED,X,IERRORS) 
or,  in  structured  programming  form, 
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CALL  RANCSQ 

(N 

,  ISEED 
,X 


I 


,  NU 

,  IERRORS  ) 


The  parameter  list  is  given  below: 


Given  arguments 
N 

NU 

Both 

ISEED 

Yielded  arguments 
X(N) 

IERRORS 


Number  of  variates  to  be  generated 
(N  >0) 

Degrees  of  freedom  parameter  of  the  chi-square  distribution 
(NU  >  0) 


Integer  seed 

(1  <  ISEED  <231-1) 


Output  array  of  dimension  N  containing  the  generated  chi- 
square  variates 

Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Degrees  of  freedom  parameter  out  of  range 

3  :  Seed  out  of  range  ) 
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RANEXP 


PURPOSE 


Subroutine  RANEXP  generates  n  exponential  random  variates  from  an  exponential 
distribution  with  parameter  a  (a  >  0).  a  is  interpreted  as  the  average  rate  per  unit  of  time 
or  the  average  time  to  failure  (average  lifetime).  The  form  of  the  exponential  distribution 
used  is 


a 

The  mean  and  variance  of  the  exponential  are 
mean  =  a 


x>0 
a  >  0 


variance  =  a 2 


REFERENCE 

1.  Fishman,  G.  S.  (1973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation , 
John  Wiley  &  Sons,  Inc.,  p.  203. 

INPUT  GUIDE 

The  call  line  for  subroutine  RANEXP  is 
CALL  RANEXP  (N,A,ISEED,X, ERROR) 
or,  in  structured  programming  form, 

CALL  RANEXP 

G  (N  ,  A 

B  ,  ISEED 

Y  ,  X  IERROR ) 

The  parameter  list  is  given  below: 

Given  arguments 

N  Number  of  variates  to  be  generated 

(N  >0) 
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A 


Both 

ISEED 

Yielded  arguments 
X(N) 

IERROR 


Mean  of  the  exponential  distribution 
(A  >  0) 


Integer  seed 

(1  <  ISEED  <23'-l) 


Output  array  of  dimension  N  containing  the  generated  expo¬ 
nential  variates 

Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Exponential  parameter  out  of  range 

3  :  Seed  out  of  range  ) 
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RANFDI 


PURPOSE 

Subroutine  RANFDI  generates  n  F  random  variates  from  an  F  distribution  with  pa¬ 
rameters  IA  and  IB  (called  the  numerator  and  denominator  degrees  of  freedom,  respectively, 
of  the  F  distribution)  where  both  I  A  and  IB  are  positive  integers.  The  form  of  the  F  dis¬ 
tribution. used  is 

H(/A  +IB)/2]  ( IA  T'2  lAI2J  M  VM+/fl)/2 

}  T(IA  t2)V(IB  /2){lB  )  {  IB  J 

The  mean  and  variance  of  the  F  distribution  are 
mean  =  IB  /  (IB  -  2) 

variance  =  (2  IB  \IA  +  IB-2))/ (I A  (IB  -  2  f  (IB  -  4)) 

FEATURES 

The  generation  scheme  used  in  RANFDI  is  based  on  the  relationship  between  the  beta 
and  F  distributions;  namely,  the  F  with  IA  and  IB  degrees  of  freedom  is  equivalent  to  the 
following  expression  where  Y  is  a  beta  random  variable  with  parameters  a  =  IA  /  2  and  b 
=  IB  /  2  : 

(IB  •  Y)!(IA(\-Y)) 


Q<x  <°° 

, /A  =  1,2,3,... 
IB  =  1,2,3, ... 

for  IB  >2 
for  IB  >  4 


REFERENCES 

1.  Fishman,  G.  S.  (1973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation, 
John  Wiley  &  Sons,  Inc.,  p.  130. 

2.  Krutchkoff,  R.  G.  (1970),  Probability  and  Statistical  Inference,  Gordon  and  Breach, 
pp.  54  -  56. 

3.  Walpole,  R.  E.  and  Myers,  R.  H.  (1985),  Probability  and  Statistics  for  Engineers  and 
Scientists,  Third  Edition,  Macmillan,  pp.  207  -  210. 
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INPUT  GUIDE 

The  call  line  for  subroutine  RANFDI  is 
CALL  RANFDI  (N,IA,IB,ISEED,X,IERRS) 
or,  in  structured  programming  form, 

CALL  RANFDI 

( N  ,  IA  ,  IB 

,  ISEED 

,  X  ,  IERRS  ) 

The  parameter  list  is  given  below: 

Given  arguments 

N  :  Number  of  variates  to  be  generated 

(N  >  0) 

I A  :  Numerator  degrees  of  freedom  parameter  for  the  F  distribu¬ 

tion 

(IA  >  0) 

IB  :  Denominator  degrees  of  freedom  parameter  for  the  F  distri¬ 

bution 
(IB  >  0) 

Both 

ISEED  :  Integer  seed 

(1  <  ISEED  <  231  -  1) 

Yielded  arguments 

X(N)  :  Output  array  of  dimension  N  containing  the  generated  F  va¬ 

riates 

IERRS  :  Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Numerator  degrees  of  freedom  parameter  out  of  range 

3  :  Denominator  degrees  of  freedom  parameter  out  of 

range 

4  :  Seed  out  of  range  ) 
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RANGAM 


PURPOSE 


Subroutine  R  A.NGAM  generates  n  gamma  random  variates  from  a  gamma  distribution 
with  parameters  a  and  b  ( a  and  b  both  >  0).  The  form  of  the  gamma  distribution  used  is 


/(*)  = 


1 


T(a)ba 


■  1  -x  /  b 

e 


0  <X  <  °o 
,a>  0 
b>  0 


The  mean  and  variance  of  the  gamma  distribution  are 
mean  =  ab 


variance  =  a  b2 


FEATURES 

The  generation  scheme  used  in  RANGAM  is  written  in  two  parts,  keying  upon  whether 
the  parameter  a  is  integral  or  nonintegral.  A  more  computationally  efficient  algorithm  is 
included  to  handle  the  special  case  in  which  a  >  1  and  b-l. 


REFERENCES 

1.  Fishman,  G.  S.  (1973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation, 
John  Wiley  &  Sons,  Inc.,  pp.  203  -  204  and  pp.  208  -  209. 

2.  Knuth,  D.  E.  ( 1 98 1 ),  The  Art  of  Computer  Programming,  Volume  2  /  Seminumerical 
Algorithms,  Second  Edition,  Addison-Wesley,  p.  129. 


INPUT  GUIDE 

The  call  line  for  subroutine  RANGAM  is 
CALL  RANGAM  (N,A,B,ISEED,X,IERROR) 
or,  in  structured  programming  form, 
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CALL  RANGAM 

(N  ,  A  ,  B 

,  ISEED 

,  X  ,  IERROR  ) 

The  parameter  list  is  given  below: 

Given  arguments 

N  :  Number  of  variates  to  be  generated 

(N  >  0) 

A  :  Shape  parameter  of  the  gamma  distribution 

(A  >  0) 

B  :  Scale  parameter  of  the  gamma  distribution 

(B  >  0) 

Both 

ISEED  :  Integer  seed 

(1  <  ISEED  <231-1) 

Yielded  arguments 

X(N)  :  Output  array  of  dimension  N  containing  the  generated  gamma 

variates 

IERROR  :  Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Shape  parameter  A  out  of  range 

3  :  Scale  parameter  B  out  of  range 

4  :  Seed  out  of  range  ) 
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RANLGS 


PURPOSE 

Subroutine  RANLGS  generates  n  logistic  random  variates  from  a  logistic  distribution 
with  parameters  a  and  b  (b>  0).  The  value  of  a  determines  the  location  of  the  distribution 
on  the  abscissa.  The  value  of  b  controls  the  degree  of  spread  in  the  distribution.  The  form 
of  the  logistic  distribution  used  is 

/(x)  =  (/-r)/^(1+e-1rjj  ,  —  <a<- 

b  >0 

The  mean  and  variance  of  the  logistic  distribution  are 
mean  =  a 

variance  =  (bn)2 13 


REFERENCES 

1 .  Fishman,  G.  S.  (1973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation, 
John  Wiley  &  Sons,  Inc.,  p.  201  and  p.  241. 

2.  Johnson,  N.  L.  and  Kotz,  S.  (1970),  Continuous  Univariate  Distributions  -  2, 
Houghton  Mifflin,  pp.  1  -  5. 


INPUT  GUIDE 

The  call  line  for  subroutine  RANLGS  is 
CALL  RANLGS  (N,A,B,ISEED,X,IERROR) 
or,  in  structured  programming  form, 

CALL  RANLGS 

G  (N  ,  A  ,  B 

B  ,  ISEED 

Y  ,  X  IERROR ) 

The  parameter  list  is  given  below: 
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Given  arguments 
N 

A 

B 


Both 

ISEED 

Yielded  arguments 
X(N) 

ERROR 


I 


Number  of  variates  to  be  generated 
(N  >0) 

Location  parameter  (mean)  of  the  logistic  distribution 

Scale  parameter  of  the  logistic  distribution 
(B  >  0) 


Integer  seed 

(1  <  ISEED  <231-1) 


Output  array  of  dimension  N  containing  the  generated  logistic 
variates 

Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Scale  parameter  B  out  of  range 

3  :  Seed  out  of  range  ) 


I 
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RANLOG 


PURPOSE 


Subroutine  RANLOG  generates  n  lognormal  random  variates  from  a  lognormal  dis¬ 
tribution  with  parameters  FMU  and  SIG  ( SIG  >  0).  FMU  is  the  mean  and  SIG  is  the 
standard  deviation  of  the  underlying  normal  distribution.  For  the  normal  distribution,  the 
value  of  FMU  determines  the  distribution  location  on  the  abscissa,  while  the  value  of  SIG 
controls  the  degree  of  spread  in  the  normal  random  variates.  The  form  of  the  lognormal 
distribution  used  is 


/(*)  = 


1 


(Inx -FMU) 

isia2 


SIG  x  (2  k) 


1/2' 


0  <x  <  °° 

— oo  <  FMU  <  °° 
SIG  >0 


The  mean  and  variance  of  the  lognormal  distribution  are 
mean  =  e(FMU+slc2,2) 


variance  = 


FEATURES 

The  generation  scheme  used  in  RANLOG  is  based  on  the  fact  that  if  Y  has  a  normal 
distribution  with  mean  FMU  and  variance  SIG2,  then  X  =  eY  has  a  lognormal  distribution 


with 

mean 

FX 

_  e(FMU  +  SIG2I2) 

and 

variance 

SX2 

_  (2FMU  +SIG2) 

The  Box-Muller  procedure  (Reference  1)  is  used  to  generate  the  required  normal  variates. 

If  the  user  wishes  to  specify  FX  and  SX2  instead  of  FMU  and  SIG2,  the  following 
expressions  can  be  used  to  calculate  the  corresponding  values  of  FMU  and  SIG  required 
by  subroutine  RANLOG: 

FMU  =  0.5  ln(FX4/ (SX2  +  FX2) ) 

SIG2  =  ln(  (SX2/FX2)+  1) 
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REFERENCES 

1.  Box,  G.  E.  P.  and  Muller,  M.  E.  (1950),  “A  Note  on  the  Generation  of  Random 
Normal  Deviates”,  Annals  of  Mathematical  Statistics,  Volume  29,  pp.  610  -  611. 

2.  Fishman,  G.  S.  (1973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation, 
John  Wiley  &  Sons,  Inc.,  p.  214. 

3.  Lindgren,  B.  W.  (1968),  Statistical  Theory,  Second  Edition,  The  Macmillan  Com¬ 
pany,  p.  176. 


INPUT  GUIDE 

The  call  line  for  subroutine  RANLOG  is 


CALL  RANLOG  (N,FMU,SIG,ISEED,X, ERROR) 

or,  in  structured  programming  form, 

CALL  RANLOG 

G  (N 

,  FMU  ,  SIG 

B  ,  ISEED 

Y  ,X 

,  IERROR  ) 

The  parameter  list  is  given 

below: 

Given  arguments 

N  : 

Number  of  variates  to  be  generated 
(N  >  0) 

FMU  : 

Mean  of  the  underlying  normal  distribution 

SIG 

Standard  deviation  of  the  underlying  normal  distribution 
(SIG  >0) 

Both 

ISEED  : 

Integer  seed 

(1  <  ISEED  <231-1) 

Yielded  arguments 

X(N)  : 

Output  array  of  dimension  N  containing  the  generated  log¬ 
normal  variates 
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I  ERROR 


Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Standard  deviation  of  the  underlying  normal  im¬ 

properly  specified 

3  :  Seed  out  of  range  ) 
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RANNOR 


PURPOSE 

Subroutine  RANNOR  generates  n  normal  random  variates  from  a  normal  distribution 
with  parameters  fmu  and  sig  ( sig  >  0).  fmu  is  the  mean  of  the  distribution  and  sig  is  the 
distribution  standard  deviation.  The  value  of  fmu  determines  the  distribution  location  on 
the  abscissa,  while  the  value  of  sig  controls  the  degree  of  spread  in  the  normal  random 
variates.  The  form  of  the  normal  distribution  used  is 

— oo  <  X  <°° 

,  — oo  <  fmu  <  °° 
sig  >  0 

The  mean  and  variance  of  the  normal  distribution  are 
mean  =  fmu 
variance  =  sig2 


/(*)  = 


1 


(.I-ptlM) 

Irig2 


sig(2n) 


1/2' 


FEATURES 

The  generation  scheme  used  in  RANNOR  is  known  as  the  Box-Muller  procedure  (see 
Reference  1). 


REFERENCES 

1.  Box,  G.  E.  P.  and  Muller,  M.  E.  (1950),  “A  Note  on  the  Generation  of  Random 
Normal  Deviates”,  Annals  of  Mathematical  Statistics,  Volume  29,  pp.  610  -  611. 

2.  Fishman,  G.  S.  (1973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation, 
John  Wiley  &  Sons,  Inc.,  p.  211  -  213. 


INPUT  GUIDE 

The  call  line  for  subroutine  RANNOR  is 
CALL  RANNOR  (N,FMU,SIG,ISEED,X,IERROR) 
or,  in  structured  programming  form, 
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CALL  RANNOR 

( N  ,  FMU  ,  SIG 

,  ISEED 

,  X  ,  IERROR  ) 

The  parameter  list  is  given  below: 

Given  arguments 

N  :  Number  of  variates  to  be  generated 

(N  >  0) 

FMU  :  Mean  of  the  normal  distribution 

SIG  :  Standard  deviation  of  the  normal  distribution 

(SIG  >  0) 

Both 

ISEED  :  Integer  seed 

(1  <  ISEED  <23,-l) 

Yielded  arguments 

X(N)  :  Output  array  of  dimension  N  containing  the  generated  normal 

variates 

IERROR  Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Standard  deviation  of  the  normal  improperly  speci¬ 

fied 

3  :  Seed  out  of  range  ) 
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RANNVE 


PURPOSE 


Subroutine  RANNVE  generates  n  random  normal  vectors  each  of  length  ip  from  a 
multivariate  normal  distribution  with  mean  vector  fmu  and  variance-covariance  matrix  A. 
The  form  of  the  multivariate  normal  distribution  used  is 


1  --(i-fmu)'A  '(x-fmu) 

/<x)=(2itr'!iAii,je 

where 

X 

:  An  ip-dimensional  vector  of  random  variables  following  a 
multivariate  normal  distribution 

fmu 

:  An  ip-dimensional  vector  containing  the  means  of  each  of 
the  ip  variables 

A 

:  A  symmetric  matrix  of  dimension  ip  by  ip  containing  the 
variances  and  covariances  of  the  ip  variables 

1  A[ 

:  The  determinant  of  matrix  A 

Each  of  the  ip  variables  ranges  from  -®°  to  Each  element  of  the  mean  vector  fmu  ranges 
from  -oo  to  °°.  The  mean  and  variance  of  the  multivariate  normal  distribution  are 

mean  =  fmu 
variance  =  A 


FEATURES 

In  employing  RANNVE  the  user  must  choose  among  three  input  options,  each  per¬ 
taining  to  the  form  of  the  symmetric  matrix  A: 

(1)  A  is  input  as  a  full  variance-covariance  matrix;  i.e.,  the  (i,i)th  diagonal  element 
is  the  variance  of  the  ith  variable  while  the  (i,j)th  off-diagonal  element  represents 
the  covariance  between  the  ith  and  jth  variables.  (Covariances  can  be  positive, 
negative  or  zero.) 

(2)  A  is  input  as  a  “pseudo-correlation”  matrix  in  which  the  (i,j)th  element  represents 
the  correlation  (a  value  ranging  from  -  1.0  to  1.0)  between  the  ith  and  jth  vari¬ 
ables  and  the  (i,i)th  diagonal  element  represents  the  standard  deviation  of  the  ith 
variable. 
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(3)  A  is  input  as  a  diagonal  matrix;  i.e.,  the  (i,i)th  diagonal  element  is  the  variance 
of  the  ith  variable  while  the  off-diagonal  elements  are  all  zero  implying  that  the 
ip  variables  are  uncorrelated. 

The  details  of  the  generation  scheme  used  in  RANNVE  are  found  in  References  2  and 
4. 


REFERENCES 

1.  Browne,  E.  T.  (1958),  Introduction  to  the  Theory  of  Determinants  and  Matrices , 
University  of  North  Carolina  Press,  pp.  120  -  121. 

2.  Fishman,  G.  S .  ( 1 973),  Concepts  and  Methods  in  Discrete  E  vent  Digital  Simulation, 
John  Wiley  &  Sons,  Inc.,  pp.  215  -  217. 

3.  Morrison,  D.  F.  (1967),  Multivariate  Statistical  Methods,  McGraw-Hill,  pp.  80  -81. 

4.  Scheuer,  E.  and  Stoller,  D.  S.  (1962),  “On  the  Generation  of  Normal  Random 
Vectors”,  Technometrics,  Volume  4  (May),  pp.  278  -  281. 


INPUT  GUIDE 

The  call  line  for  subroutine  RANNVE  is 

CALL  RANNVE  (N,IP,FMU,A,IOPTION,ISEED,AC,C,Z,X,IERROR) 
or,  in  structured  programming  form, 

CALL  RANNVE 


(N 

,  ISEED 

,IP 

,  FMU 

» A 

,  IOPTION 

,  AC 

,c 

,z 

,x 

,  IERROR ) 

The  parameter  list  is  given  below: 

Given  arguments 

N  :  Number  of  random  variates  to  be  generated 

(N  >  0) 

IP  :  Length  of  the  desired  random  vectors 

(IP  >  0) 

FMU(IP)  Mean  vector  of  the  multivariate  normal  distribution 
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A(IP,IP) 

IOPTION 

Both 

ISEED 

Yielded  arguments 
AC(IP,IP) 

C(IP,IP) 

Z(1P) 

X(N,IP) 

IERROR 


Input  matrix  for  the  multivariate  normal  distribution 
(A  must  be  symmetric  and  positive  definite) 

Parameter  specifying  which  input  option  the  user  has  selected 
for  matrix  A 
(IOPTION  =  1,  2,  or  3) 


Integer  seed 

(1  <  ISEED  <  231  -  1) 


A  matrix  of  dimension  IP  by  IP  used  in  testing  matrix  A  for 
positive  definiteness 

A  lower  triangular  matrix  of  dimension  IP  by  IP  used  in  the 
generation  of  random  normal  vectors 

A  vector  of  dimension  IP  containing  IP  independent  normal 
random  variates  each  with  zero  mean  and  unit  variance 

Output  array  of  dimension  N  by  IP  containing  the  N  gener¬ 
ated  random  normal  vectors  each  of  length  IP 

Input  error  flag 
(  0  :  No  input  errors 

1  :  Number  of  random  normal  vectors  out  of  range 

2  :  Vector  length  out  of  range 

3  :  Matrix  input  option  out  of  range 

4  :  At  least  one  variance  or  standard  deviation  improp¬ 

erly  specified 

5  :  Matrix  A  is  not  symmetric 

6  :  At  least  one  correlation  out  of  range  under  option  2 

7  :  Matrix  A  is  not  positive  definite 

8  :  Seed  out  of  range  ) 


COMMENTS 

The  user  should  exercise  care  in  inputting  matrix  A.  Regardless  of  the  input  option 
chosen,  A  must  be  symmetric.  If  option  1  or  3  is  selected,  A  is  a  variance-covariance 
matrix  and  must  be  positive  definite.  If  option  2  is  chosen,  A  is  a  pseudo-correlation  matrix 
which  is  transformed  within  RANNVE  to  a  variance-covariance  matrix  and  then  checked 
for  positive  definiteness.  For  a  discussion  of  positive  definite  matrices  see  Reference  1. 


259 


NSWC  TR  89-97 


In  the  user’s  main  program  the  array  X(NJP)  must  be  dimensioned  so  as  to  conform 
exactly  to  the  input  values  of  N  and  IP.  For  example,  if  the  user  wishes  to  generate  100 
vectors  each  of  length  3,  the  X  array  must  be  dimensioned  as  X(100,3)  rather  than  to  a  size 
larger  than  actually  required,  say,  X (500, 10).  Using  a  larger  dimension  than  necessary  will 
yield  invalid  results. 
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RANPDI 


PURPOSE 

Subroutine  RANPDI  is  designed  to  generate  nm  random  numbers  from  one  of  the 
distribution  types  in  the  Pearson  system  of  frequency  curves.  Any  distribution  which  is 
determined  by  its  mean  ( p )  and  its  second,  third,  and  fourth  central  moments  (  p^  p4) 

is  a  member  of  the  Pearson  family  of  distributions.  This  family  contains  distributions 
which  are  bell-shaped,  J-shaped,  L-shaped,  and  U-shaped.  Within  these  four  general  shapes 
is  a  continuum  of  skewed,  flattened,  and  peaked  curves. 

Admissible  values  of  the  moments  p.,  p^,  P3,  and  p4  are: 

—00  <  p,  <  00 

P2>0 

—00  <  p^  <  00 
44  >0 

P2  represents  the  distribution  variance,  P3  is  related  to  the  degree  of  skewness  (lack  of 
symmetry  about  the  mean  of  the  distribution),  and  p4  is  related  to  the  amount  of  kurtosis 
(peakedness)  that  the  distribution  exhibits. 

The  Pearson  family  provides  an  excellent  source  of  distributions  which  depart  from 
normality  (symmetric  and  bell-shaped  distribution)  in  varying  degrees  of  skewness  and 
kurtosis.  These  latter  measures  of  departure  from  normality  are  given  by  the  expressions 

Pj(skewness)  =p^/p^ 

P2(kurtosis)  =  p4/p^ 

For  purposes  of  reference  the  values  of  skewness  and  kurtosis  for  the  normal  distribution 
are  P,  =  0  and  p2  =  3,  respectively.  Since  the  assumption  of  normality  of  data  is  integral  to 

many  statistical  procedures,  RANPDI  can  be  very  useful  in  designing  simulation  studies  to 
evaluate  the  effect  of  violations  of  the  normality  assumption. 

Subroutine  RANPDI  allows  generation  of  random  variates  from  nine  of  the  distribution 
types  in  the  Pearson  system.  Of  these  nine,  three  (Types  1,  4,  and  6)  are  called  main  types 
while  the  remaining  six  (Types  2,  3,  5,  7,  10,  and  13)  are  referred  to  as  transition  types. 
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There  are  limits  on  the  admissible  values  of  skewness  and  kurtosis  which  will  give  rise 
to  one  of  the  Pearson  distribution  types  treated  in  subroutine  RANPDI.  Hence,  not  all  valid 
choices  of  the  moments  (I,  (i2,  |i3,  and  |i4  will  enable  the  user  to  obtain  Pearson  random 
variates. 

For  further  details  regarding  the  Pearson  system,  especially  the  functional  forms  of  the 
distribution  types,  the  user  should  consult  References  1  and  2. 


FEATURES 

Subroutine  RANPDI  takes  the  values  of  the  moments  |X,  p^,  (ij,  and  p4  that  the  user 
inputs  and  determines  the  Pearson  distribution  type  to  which  these  moments  correspond 
most  closely.  The  constant  term  associated  with  this  distribution  is  then  computed,  if  it 
exists.  Next,  a  cumulative  distribution  function  (cdf)  table  of  size  10000  is  generated  for 
the  Pearson  distribution  type  determined.  The  generation  scheme  used  in  RANPDI  is  based 
on  interpolation  within  this  cdf  table. 


REFERENCES 

1 .  Elderton,  W.  P.  and  Johnson,  N.  L.  ( 1969),  Systems  of  Frequency  Curves,  Cambridge 
University  Press,  pp.  35  -  95. 

2.  Taub,  A.  E.  (1974),  DURG  -  A  Documentation  of  the  Dahlgren  Universal  Random 
Number  Generator,  NWL  TN-K- 17774,  NSWC,  Dahlgren,  VIRGINIA  22448. 


INPUT  GUIDE 

The  call  line  for  subroutine  RANPDI  is 
CALL  RANPDI  (NRN,MU,ISEED,PDRN,EDIST,IERROR) 
or,  in  structured  programming  form, 

CALL  RANPDI 

G  ( NRN  ,  MU 

B  ,  ISEED 

Y  ,  PDRN  ,  IDIST  ,  IERROR  ) 

The  parameter  list  is  given  below: 
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Given  arguments 
NRN 

MU  (4) 


Both 

ISEED 

Yielded  arguments 

PDRN 

(NRN) 

DDIST 

IERROR 


Number  of  variates  to  be  generated 
(NRN  >  0) 

Input  array  of  dimension  4  containing  the  moments  p.,  p^,  p^, 
and  )i4  of  the  distribution  from  which  variates  are  to  be  gen¬ 
erated 

(MU (2)  >  0  and  MU(4)  >  0) 


Integer  seed 

(1  <  ISEED  <  231  -  1) 


Output  array  of  dimension  NRN  containing  the  generated 
Pearson  variates 

Pearson  distribution  type  determined  by  the  values  of  the 
moments  MU(1),  MU(2),  MU(3),  AND  MU(4) 

Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  MU (2)  and/or  MU  (4)  out  of  range 

3  :  Seed  out  of  range 

4  :  The  moments  MU(1),  MU(2),  MU(3),  AND  MU(4) 

do  not  determine  any  of  the  Pearson  distribution 
types  treated  by  subroutine  RANPDI 

5  :  Constant  term  for  the  Pearson  distribution  type  se¬ 

lected  on  the  basis  of  the  moments  MU(1),  MU(2), 
MU(3),  AND  MU (4)  cannot  be  computed  ) 


COMMENTS 

A  graph  displaying  the  admissible  region  of  values  for  skewness  and  kurtosis  can  be 
found  on  page  3  of  Reference  2.  This  can  aid  the  user  in  determining  input  values  for  the 
moments  n,  p^,  (i3,  and  (t4.  If  the  skewness  and  kurtosis  values  do  not  lie  in  the  admissibility 
region,  subroutine  RANPDI  will  return  IERROR  =  4. 
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RANTDI 


PURPOSE 


Subroutine  RANTDI  generates  n  T  random  variates  from  a  T  distribution  with  pa¬ 
rameter  NU  (called  the  degrees  of  freedom  of  the  T  distribution)  where  NU  is  a  positive 
integer.  The  form  of  the  T  distribution  used  is 


/(*)  = 


T[(NU  +  1)I2] 
T(NU/2)(n-NU)U2 


2  v<w+iy2 

1+w) 


—°°<X  <  00 

NU  =  1,2,3, ... 


The  mean  and  variance  of  the  T  distribution  are 

mean  =  0  for  NU  >  1 

variance  =  NU  /  (NU  -  2)  for  NU  >  2 


FEATURES 

The  generation  scheme  used  in  RANTDI  is  based  on  the  relationship  between  the 
normal,  chi-square  and  T  distributions;  namely,  the  T  distribution  with  NU  degrees  of 
freedom  is  equivalent  to  the  ratio  of  a  standard  normal  variate  (Z)  to  the  square  root  of  a 
chi-square  variate  (F)  with  NU  degrees  of  freedom  divided  by  its  degrees  of  freedom: 

Z/(Y/NU)V2 


REFERENCES 

1 .  Fishman,  G.  S.  (1973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation, 
John  Wiley  &  Sons,  Inc.,  p.  213. 

2.  Krutchkoff,  R.  G.  (1970),  Probability  and  Statistical  Inference,  Gordon  and  Breach, 
pp.  56  -  57. 

3.  Walpole,  R.  E.  and  Myers,  R.  H.  (1985),  Probability  and  Statistics  for  Engineers  and 
Scientists,  Third  Edition,  Macmillan,  pp.  202  -  207. 
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INPUT  GUIDE 

The  call  line  for  subroutine  RANTDI  is 
CALL  RANTDI  (N,NU,ISEED,X,IERRS) 
or,  in  structured  programming  form, 

CALL  RANTDI 

(N  ,  NU 

,  ISEED 

,  X  ,  IERRS  ) 


The  parameter  list  is  given 

below: 

Given  arguments 

N  : 

Number  of  variates  to  be  generated 
(N  >  0) 

NU  : 

Degrees  of  freedom  parameter  of  the  T  distribution 
(NU  >  0) 

Both 

ISEED  : 

Integer  seed 

(1  <  ISEED  <23i-1) 

Yielded  arguments 

X(N)  : 

Output  array  of  dimension  N  containing  the  generated 
T  variates 

IERRS  : 

Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Degrees  of  freedom  parameter  out  of  range 

3  :  Seed  out  of  range  ) 

I 
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RANUNI 


PURPOSE 


Subroutine  RANUNI  generates  n  uniform  random  variates  from  a  continuous  uniform 
distribution  over  the  interval  a  to  b  (a  <  b).  The  form  of  the  continuous  uniform  distribution 
used  is 


fix)  =l/(b  -  a) 


a  <x  <b 
—o°  <a  <  b  <  °° 


The  mean  and  variance  of  the  continuous  uniform  distribution  are 
mean  =  (a+b)/ 2 
variance  =  ( b  -a)2/ 12 


REFERENCE 

1.  Fishman,  G.  S.  (1973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation, 
John  Wiley  &  Sons,  Inc.,  pp.  200  -  202. 


INPUT  GUIDE 

The  call  line  for  subroutine  RANUNI  is 
CALL  RANUNI  (N,A,B,ISEED,X,IERROR) 
or,  in  structured  programming  form, 

CALL  RANUNI 

G  (N  ,  A  ,  B 

B  ,  ISEED 

Y  , X  ERROR ) 

The  parameter  list  is  given  below: 

Given  arguments 

N  :  Number  of  variates  to  be  generated 

(N  >0) 
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A 

B 


Both 

ISEED 

Yielded  arguments 
X(N) 

ERROR 


Lower  limit  of  the  generation  interval 
(A  <  B) 

Upper  limit  of  the  generation  interval 
(B  >  A) 


Integer  seed 

(1  <  ISEED  <23!-l) 


Output  array  of  dimension  N  containing  the  generated  uni¬ 
form  variates 

Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Generation  interval  improperly  specified 

3  :  Seed  out  of  range  ) 


) 
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RANCIR 


PURPOSE 

Subroutine  RANCIR  generates  n  pairs  of  points  uniformly  within  a  circle  of  radius  a 
centered  at  the  point  (hx,  hy).  At  the  user’s  option,  the  points  are  generated  in  either  rect¬ 
angular  (i.e.,  cartesian)  or  polar  coordinates.  If  polar  coordinates  are  requested,  the 
generated  angles  are  positive  and  are  expressed  in  degrees. 


RANCIR  utilizes  the  following  sets  of  equations  relating  rectangular  ( x ,  y)  and  polar 
(r,  0)  coordinates  (see,  for  example,  Reference  2): 


x  =rcos0 
y  =  r  sin  0 


0) 


and 


r2=x2  +  yz 
tan0  =  y/x 


(2) 


The  set  (1)  transforms  polar  coordinates  to  rectangular  coordinates,  while  the  set  (2)  per¬ 
forms  the  reverse  transformation. 


REFERENCES 

1.  Crigler,  J.  R.,  “A  Note  on  the  Generation  of  Coordinate  Pairs  Uniformly  Within  a 
Circle  of  Radius  A”,  unpublished  note  to  K106  files  dated  April  9,  1982. 

2.  Goodman,  A.  W.  (1965),  Analytic  Geometry  and  the  Calculus,  The  Macmillan 
Company,  pp.  375  -  377. 


INPUT  GUIDE 

The  call  line  for  subroutine  RANCIR  is 
CALL  RANCIR  (N,A,HX,HY,IT,ISEED,X,Y,IERROR) 
or,  in  structured  programming  form, 
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CALL  RANCIR 


(N  ,  A 

,  HX 

,  ISEED 

,X  ,  Y 

,  1  ERROR  ) 

The  parameter  list  is  given  below: 


Given  arguments 
N 

A 

HX 

HY 

IT 


Both 

ISEED 

Yielded  arguments 
X(N) 

Y(N) 

TERROR 


Number  of  pairs  to  be  generated 
(N  >  0) 

Radius  of  the  circle 

(A  >  0) 

X-coordinate  (cartesian)  of  the  center  of  the  circle 
Y-coordinate  (cartesian)  of  the  center  of  the  circle 

Type  of  coordinates  requested,  i.e., 

=  1  for  cartesian  coordinates 
=  2  for  polar  coordinates 

Integer  seed 

(1  <  ISEED  <  231  -  1) 

Output  array  of  dimension  N  containing  the  generated  x- 
coordinates  (radial  distances  if  polar  coordinates  requested) 

Output  array  of  dimension  N  containing  the  generated  y- 
coordinates  (positive  angles  in  degrees  if  polar  coordinates 
requested) 

Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  pairs  out  of  range 

2  :  Circle  radius  out  of  range 

3  :  Coordinate  type  out  of  range 

4  :  Seed  out  of  range  ) 


COMMENTS 

The  user  should  note  that  the  circle  center  must  be  specified  in  cartesian  coordinates 
on  input. 
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RANWEI 


PURPOSE 


Subroutine  RANWEI  generates  n  Weibull  random  variates  from  a  Weibull  distribution 
with  parameters  a,  b,  and  c.  a  is  interpreted  as  the  location  parameter  (determines  where 
the  distribution  is  on  the  abscissa),  b  as  the  shape  parameter  ( b  >  0),  and  c  as  the  scale 
parameter  (c  >  0)  of  the  Weibull.  The  form  of  the  Weibull  distribution  used  is 


h  x~a  Y 

f(x)=-(x-af-le^c  j 
c 


a  <x  <  °°  b  >  0 

—oo  <  a  <  °°  ’  c  >  0 


The  mean  and  variance  of  the  Weibull  distribution  are 
mean  =  a  +c  ■ 

variance  =  c2(r(^)-[r(^)]2) 


REFERENCES 

1 .  Fishman,  G.  S.  ( 1 973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation, 
John  Wiley  &  Sons,  Inc.,  p.  211. 

2.  Johnson,  N.  L.  and  Kotz,  S.  (1970),  Continuous  Univariate  Distributions  -  1, 
Houghton  Mifflin,  pp.  250  -  252. 

3.  Walpole,  R.  E.  and  Myers,  R.  H.  (1985),  Probability  and  Statistics  for  Engineers  and 
Scientists,  Third  Edition,  Macmillan,  pp.  156  -  157. 


INPUT  GUIDE 

The  call  line  for  subroutine  RANWEI  is 
CALL  RANWEI  (N,A,B,C,ISEED,X,IERROR) 
or,  in  structured  programming  form, 

CALL  RANWEI 

G  (N  ,  A  ,  B  ,C 

B  ,  ISEED 

Y  ,  X  IERROR  ) 
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The  parameter  list  is  given  below: 

Given  arguments 

N  :  Number  of  variates  to  be  generated 

(N  >  0) 

A  :  Location  parameter  of  the  Weibull 

B  :  Shape  parameter  of  the  Weibull 

(B  >  0) 

C  :  Scale  parameter  of  the  Weibull 

(C>0) 

Both 

ISEED  :  Integer  seed 

(1  <  ISEED  <  231  —  1) 

Yielded  arguments 

X(N)  :  Output  array  of  dimension  N  containing  the  generated  Wei¬ 

bull  variates 

IERROR  :  Input  error  flag 

(  0  :  No  input  errors 

1  :  Number  of  variates  out  of  range 

2  :  Shape  parameter  out  of  range 

3  :  Scale  parameter  out  of  range 

4  :  Seed  out  of  range  ) 
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RANMK1 


PURPOSE 

Subroutine  RANMK1  generates  an  autocorrelated  sequence,  X„,  of  length  n  where  Xn 

is  a  first-order  Markov  process  with  parameter  alpha  (-1  <  alpha  <  1).  The  sequence  is 
generated  via  the  recursive  expression 

Xn  -fmu  =  alpha  ■  (Xn  . ,  -fmu ) + Zn  . 

Z„  is  the  random  error  term,  assumed  to  be  normally  distributed  with  mean  0  and  standard 
deviation  sig  ( sig  >  0).  The  generation  scheme  assumes  that  Xn  is  a  normal  process  with 
mean  fmu  and  variance  given  by 

sig2l(\  -  alpha2) . 

Xn  will  be  a  stationary  process  provided  that  -1  <  alpha  <  1.  We  note  that  a  first-order 
Markov  process  is  synonymous  with  a  first-order  autoregressive  (AR)  process. 


REFERENCE 

1 .  Fishman,  G.  S.  ( 1973),  Concepts  and  Methods  in  Discrete  Event  Digital  Simulation , 
John  Wiley  &  Sons,  Inc.,  pp.  236  -  237. 

INPUT  GUIDE 

The  call  line  for  subroutine  RANMK1  is 
CALL  RANMK1  (N,FMU, SIG, ALPHA, ISEED,Y,X,IERROR) 
or,  in  structured  programming  form, 

CALL  RANMKI 

G  (N  ,  FMU  ,  SIG  .ALPHA 

B  ,  ISEED  ,  Y 

Y  ,  X  IERROR  ) 

The  parameter  list  is  given  below: 

Given  arguments 
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N  :  Length  of  the  sequence  to  be  generated 

(N  >  0) 

FMU  :  Mean  of  the  normal  process 

SIG  :  Standard  deviation  of  the  normally  distributed  error  term 

(SIG  >  0) 

ALPHA  :  Value  of  the  Markov  model  parameter 
(-1  <  ALPHA  <  1) 

Both 

ISEED  :  Integer  seed 

(1  <  ISEED  <231-1) 

Y(N)  :  Array  of  dimension  N  containing  the  normally  distributed 

error  terms  for  the  first-order  Markov  process 

Yielded  arguments 

X(N)  :  Output  array  of  dimension  N  containing  the  generated  first- 

order  Markov  sequence 

IERROR  :  Input  error  flag 

(  0  :  No  input  errors 

1  :  Sequence  length  out  of  range 

2  :  Standard  deviation  of  the  normally  distributed  error 

term  improperly  specified 

3  :  First-order  Markov  model  parameter  improperly 

specified 

4  :  Seed  out  of  range  ) 
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GLOSSARY 


The  documentation  of  the  programs  and  subroutines  in  STATLIB  contain  several  in¬ 
stances  of  mathematical  notation  which  may  not  be  familiar  to  the  reader.  Rather  than 
repeatedly  explaining  this  notation  each  time  it  appears,  these  items  have  been  collected 
into  this  glossary  for  definition  and  easy  reference. 


1.  n\ 


3.  nx) 

4.  Ix(a,b) 


Factorial  notation; 

n!  =  Ix2x3x...x(n-l)xn. 

Combinatorial  notation;  the  number  of  combinations  of  n  dif¬ 
ferent  things  taken  x  at  a  time; 

'n')  n\ 

Kxj  x\(n  - 1)! 

Gamma  function; 

r{x)=fe-‘tx-ldt,  0<x  <°° . 

0 

Incomplete  Beta  function  ratio; 


Ix(a,b) 


T(a  +b) 

na)T(b) 


-tf-'dt 


0  <x  <  1 

a,b  >  0 
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